We formalize the Nanometers pipeline as a sequence of operators acting on a structured problem space. Let Πdenote the set of all well-formed Erdős problem statements, and let L denote the space of natural-language mathematical arguments. The pipeline computes a composite map
Φ:Π→L×{0,1}
(1)
where the first component is a candidate proof and the second is a verification bit produced by the Lean 4 type-checker. We decompose Φ into five stages.
2.1 Problem Acquisition
Definition2.1(Problem Representation).
A problem instance is a tuple π=(k,σ,τ,M)∈Π, where k∈N is the catalogue index, σ∈L is the problem statement, τ⊆T is a set of subject tags from a fixed taxonomy T, and M∈{open,partial,solved} is the known status.
The acquisition operator Fetch:N→Π retrieves the problem instance from the Erdős Problems catalogue. Concretely, Fetch(k) issues an HTTP request to erdosproblems.com/k and parses the response into the tuple representation via structured HTML extraction. If the request fails, the system falls back to interactive input via stdin.
2.2 Parallel Inference
We define two inference operators, FR (rigorous) and FC (constructive), each backed by a distinct language model and parameterized by a fixed system prompt:
FR:Π→L,FR(π)=GPT-5.4-Pro(sR,π)
(2)
FC:Π→L,FC(π)=Opus-4.6(sC,π)
(3)
Here sR,sC∈L are fixed system prompts that bias each model toward a distinct family of proof strategies.
Definition2.2(Strategy Families).
Let S denote the space of proof strategies applicable to problems in Π. We distinguish two (not necessarily disjoint) families:
The prompt sR biases FR toward outputs employing strategies in SR, and similarly sC biases FC toward SC. The families may overlap; the prompts encode a soft constraint, not a hard partition.
Both operators execute concurrently. Let tR,tC>0 denote the wall-clock times for FR(π) and FC(π) respectively. The parallel execution yields a total inference time of max(tR,tC) rather than tR+tC.
Each operator includes exponential backoff with jitter: upon receiving a rate-limit response (HTTP 429), the i-th retry is delayed by 2i+1+ξ seconds, where ξ∼Uniform(0,1) and i∈{0,1,2}, giving delays in [2,3],[4,5],[8,9]. If both operators fail after three attempts, the pipeline terminates with a diagnostic.
2.3 Synthesis Operator
Given the two proof attempts PR=FR(π) and PC=FC(π), the synthesis operator Σ constructs a combined argument. We model this as a constrained optimization:
The quality measure μ:L×Π→[0,1] assigns a score to a proof P relative to a problem π, encoding logical validity, completeness, and parsimony. The support supp(P)⊆S is the set of proof strategies, lemmas, and techniques invoked by P.
In practice, the optimization in (4) is approximated by a single call to Claude Opus 4.6 acting as a referee. The synthesis prompt presents both PR and PC along with the original problem π, and requests:
Identification of logical errors in PR and PC independently
Extraction of the strongest sub-arguments from each attempt
Composition into a single proof Σ(π,PR,PC) with all gaps closed
Rendering of the result in publication-ready LaTeX
Remark2.4(Extended Thinking Budget).
The synthesis call allocates an extended thinking budget of BΣ=65,536 tokens, compared to BF=32,768 for the initial inference calls. This asymmetry reflects the observation that comparative analysis of two mathematical arguments requires deeper sustained reasoning than the generation of a single argument. When extended thinking is enabled, the temperature parameter is fixed at T=1.0 per API constraint.
Proposition2.5(Monotonicity of Synthesis).
Suppose μ(⋅∣π) is monotone with respect to error correction: if P is obtained from P′ by correcting a logical error while preserving all valid sub-arguments, then μ(P∣π)≥μ(P′∣π). Then
μ(Σ(π,PR,PC)∣π)≥max(μ(PR∣π),μ(PC∣π))
2.4 Formal Verification
The verification stage maps natural-language proofs to the type-theoretic domain. We define two operators:
Aristotle:L→Λ∪{⊥}: the autoformalization operator, mapping natural-language proofs to Lean 4 terms (Λ) or failure (⊥)
Lean4:Λ→{ok,err}: the Lean 4 type-checker
Definition2.6(Verification Predicate).
For a candidate proof P∈L, define the verification predicate
V(P)={10if Aristotle(P)=⊥ and Lean4(Aristotle(P))=okotherwise
(5)
The composite pipeline then yields the output pair
Φ(π)=(Σ(π,FR(π),FC(π)),V(Σ(π,FR(π),FC(π))))
(6)
Remark2.7(Soundness).
Verification is one-sided: V(P)=1 implies that P is a valid proof of π (under the assumption that the Lean 4 kernel is sound). However, V(P)=0 does not imply invalidity. The autoformalization operator may return ⊥ for arguments that rely on informal reasoning or results not yet in Mathlib.
2.5 Artifact Generation
Each run of Φ(π) produces a complete audit trail. Let d be the execution date. The output directory outputs/{d}_problem{k}/ contains:
prompts.json: all system and user prompts sR,sC with model parameters
gpt54pro_raw.md, opus46_raw.md: unmodified model outputs PR,PC∈L
synthesis.md: the referee analysis and combined proof Σ(π,PR,PC)
proof.tex: self-contained LaTeX document with amsmath, amsthm preamble
formalization.lean: Lean 4 source (present when Aristotle(P)=⊥)