2.Methods

We formalize the Nanometers pipeline as a sequence of operators acting on a structured problem space. Let Π\Pidenote the set of all well-formed Erdős problem statements, and let L\mathcal{L} denote the space of natural-language mathematical arguments. The pipeline computes a composite map

Φ ⁣:ΠL×{0,1}\Phi \colon \Pi \to \mathcal{L} \times \{0, 1\}
(1)

where the first component is a candidate proof and the second is a verification bit produced by the Lean 4 type-checker. We decompose Φ\Phi into five stages.

2.1 Problem Acquisition

Definition2.1(Problem Representation).

A problem instance is a tuple π=(k,σ,τ,M)Π\pi = (k, \sigma, \tau, M) \in \Pi, where kNk \in \mathbb{N} is the catalogue index, σL\sigma \in \mathcal{L} is the problem statement, τT\tau \subseteq \mathcal{T} is a set of subject tags from a fixed taxonomy T\mathcal{T}, and M{open,partial,solved}M \in \{\texttt{open}, \texttt{partial}, \texttt{solved}\} is the known status.

The acquisition operator Fetch ⁣:NΠ\texttt{Fetch} \colon \mathbb{N} \to \Pi retrieves the problem instance from the Erdős Problems catalogue. Concretely, Fetch(k)\texttt{Fetch}(k) issues an HTTP request to erdosproblems.com/k and parses the response into the tuple representation via structured HTML extraction. If the request fails, the system falls back to interactive input via stdin\texttt{stdin}.

2.2 Parallel Inference

We define two inference operators, FRF_R (rigorous) and FCF_C (constructive), each backed by a distinct language model and parameterized by a fixed system prompt:

FR ⁣:ΠL,FR(π)=GPT-5.4-Pro(sR,π)F_R \colon \Pi \to \mathcal{L}, \quad F_R(\pi) = \texttt{GPT\text{-}5.4\text{-}Pro}(s_R,\, \pi)
(2)
FC ⁣:ΠL,FC(π)=Opus-4.6(sC,π)F_C \colon \Pi \to \mathcal{L}, \quad F_C(\pi) = \texttt{Opus\text{-}4.6}(s_C,\, \pi)
(3)

Here sR,sCLs_R, s_C \in \mathcal{L} are fixed system prompts that bias each model toward a distinct family of proof strategies.

Definition2.2(Strategy Families).

Let S\mathcal{S} denote the space of proof strategies applicable to problems in Π\Pi. We distinguish two (not necessarily disjoint) families:
  • SRS\mathcal{S}_R \subseteq \mathcal{S}: deductive strategies (induction, contradiction, compactness arguments, applications of Szemerédi regularity)
  • SCS\mathcal{S}_C \subseteq \mathcal{S}: constructive strategies (probabilistic method, algebraic constructions, Lovász Local Lemma, entropy compression, container method)

The prompt sRs_R biases FRF_R toward outputs employing strategies in SR\mathcal{S}_R, and similarly sCs_C biases FCF_C toward SC\mathcal{S}_C. The families may overlap; the prompts encode a soft constraint, not a hard partition.

Both operators execute concurrently. Let tR,tC>0t_R, t_C > 0 denote the wall-clock times for FR(π)F_R(\pi) and FC(π)F_C(\pi) respectively. The parallel execution yields a total inference time of max(tR,tC)\max(t_R, t_C) rather than tR+tCt_R + t_C.

Each operator includes exponential backoff with jitter: upon receiving a rate-limit response (HTTP 429), the ii-th retry is delayed by 2i+1+ξ2^{i+1} + \xi seconds, where ξUniform(0,1)\xi \sim \mathrm{Uniform}(0, 1) and i{0,1,2}i \in \{0, 1, 2\}, giving delays in [2,3],[4,5],[8,9][2,3], [4,5], [8,9]. If both operators fail after three attempts, the pipeline terminates with a diagnostic.

2.3 Synthesis Operator

Given the two proof attempts PR=FR(π)P_R = F_R(\pi) and PC=FC(π)P_C = F_C(\pi), the synthesis operator Σ\Sigma constructs a combined argument. We model this as a constrained optimization:

Σ(π,PR,PC)=argmaxPL  μ(Pπ)s.t.supp(P)supp(PR)supp(PC)\Sigma(\pi, P_R, P_C) = \arg\max_{P \in \mathcal{L}} \; \mu(P \mid \pi) \quad \text{s.t.} \quad \mathrm{supp}(P) \subseteq \mathrm{supp}(P_R) \cup \mathrm{supp}(P_C)
(4)

Definition2.3(Quality Measure and Support).

The quality measure μ ⁣:L×Π[0,1]\mu \colon \mathcal{L} \times \Pi \to [0,1] assigns a score to a proof PP relative to a problem π\pi, encoding logical validity, completeness, and parsimony. The support supp(P)S\mathrm{supp}(P) \subseteq \mathcal{S} is the set of proof strategies, lemmas, and techniques invoked by PP.

In practice, the optimization in (4) is approximated by a single call to Claude Opus 4.6 acting as a referee. The synthesis prompt presents both PRP_R and PCP_C along with the original problem π\pi, and requests:

  1. Identification of logical errors in PRP_R and PCP_C independently
  2. Extraction of the strongest sub-arguments from each attempt
  3. Composition into a single proof Σ(π,PR,PC)\Sigma(\pi, P_R, P_C) with all gaps closed
  4. Rendering of the result in publication-ready LaTeX

Remark2.4(Extended Thinking Budget).

The synthesis call allocates an extended thinking budget of BΣ=65,536B_\Sigma = 65{,}536 tokens, compared to BF=32,768B_F = 32{,}768 for the initial inference calls. This asymmetry reflects the observation that comparative analysis of two mathematical arguments requires deeper sustained reasoning than the generation of a single argument. When extended thinking is enabled, the temperature parameter is fixed at T=1.0T = 1.0 per API constraint.

Proposition2.5(Monotonicity of Synthesis).

Suppose μ(π)\mu(\cdot \mid \pi) is monotone with respect to error correction: if PP is obtained from PP' by correcting a logical error while preserving all valid sub-arguments, then μ(Pπ)μ(Pπ)\mu(P \mid \pi) \geq \mu(P' \mid \pi). Then
μ(Σ(π,PR,PC)π)max(μ(PRπ),μ(PCπ))\mu\bigl(\Sigma(\pi, P_R, P_C) \mid \pi\bigr) \geq \max\bigl(\mu(P_R \mid \pi),\, \mu(P_C \mid \pi)\bigr)

2.4 Formal Verification

The verification stage maps natural-language proofs to the type-theoretic domain. We define two operators:

  • Aristotle ⁣:LΛ{}\texttt{Aristotle} \colon \mathcal{L} \to \Lambda \cup \{\bot\}: the autoformalization operator, mapping natural-language proofs to Lean 4 terms (Λ\Lambda) or failure (\bot)
  • Lean4 ⁣:Λ{ok,err}\texttt{Lean4} \colon \Lambda \to \{\texttt{ok}, \texttt{err}\}: the Lean 4 type-checker

Definition2.6(Verification Predicate).

For a candidate proof PLP \in \mathcal{L}, define the verification predicate
V(P)={1if Aristotle(P)   and   Lean4(Aristotle(P))=ok0otherwiseV(P) = \begin{cases} 1 & \text{if } \texttt{Aristotle}(P) \neq \bot \;\text{ and }\; \texttt{Lean4}\bigl(\texttt{Aristotle}(P)\bigr) = \texttt{ok} \\ 0 & \text{otherwise} \end{cases}
(5)

The composite pipeline then yields the output pair

Φ(π)=(Σ(π,FR(π),FC(π)),    V(Σ(π,FR(π),FC(π))))\Phi(\pi) = \Bigl(\,\Sigma\bigl(\pi,\, F_R(\pi),\, F_C(\pi)\bigr),\;\; V\bigl(\Sigma(\pi,\, F_R(\pi),\, F_C(\pi))\bigr)\,\Bigr)
(6)

Remark2.7(Soundness).

Verification is one-sided: V(P)=1V(P) = 1 implies that PP is a valid proof of π\pi (under the assumption that the Lean 4 kernel is sound). However, V(P)=0V(P) = 0 does not imply invalidity. The autoformalization operator may return \bot for arguments that rely on informal reasoning or results not yet in Mathlib.

2.5 Artifact Generation

Each run of Φ(π)\Phi(\pi) produces a complete audit trail. Let dd be the execution date. The output directory outputs/{d}_problem{k}/ contains:

  • prompts.json: all system and user prompts sR,sCs_R, s_C with model parameters
  • gpt54pro_raw.md, opus46_raw.md: unmodified model outputs PR,PCLP_R, P_C \in \mathcal{L}
  • synthesis.md: the referee analysis and combined proof Σ(π,PR,PC)\Sigma(\pi, P_R, P_C)
  • proof.tex: self-contained LaTeX document with amsmath, amsthm preamble
  • formalization.lean: Lean 4 source (present when Aristotle(P)\texttt{Aristotle}(P) \neq \bot)