Nanometers

2.Methods

We formalize the Nanometers pipeline as a sequence of operators acting on a structured problem space. Let $\Pi$ denote the set of all well-formed Erdős problem statements, and let $\mathcal{L}$ denote the space of natural-language mathematical arguments. The pipeline computes a composite map

\Phi \colon \Pi \to \mathcal{L} \times \{0, 1\}

(1)

where the first component is a candidate proof and the second is a verification bit produced by the Lean 4 type-checker. We decompose $\Phi$ into five stages.

2.1 Problem Acquisition

Definition2.1(Problem Representation).

A problem instance is a tuple

\pi = (k, \sigma, \tau, M) \in \Pi

, where

k \in \mathbb{N}

is the catalogue index,

\sigma \in \mathcal{L}

is the problem statement,

\tau \subseteq \mathcal{T}

is a set of subject tags from a fixed taxonomy

\mathcal{T}

, and

M \in \{\texttt{open}, \texttt{partial}, \texttt{solved}\}

is the known status.

The acquisition operator $\texttt{Fetch} \colon \mathbb{N} \to \Pi$ retrieves the problem instance from the Erdős Problems catalogue. Concretely, $\texttt{Fetch}(k)$ issues an HTTP request to erdosproblems.com/k and parses the response into the tuple representation via structured HTML extraction. If the request fails, the system falls back to interactive input via $\texttt{stdin}$ .

2.2 Parallel Inference

We define two inference operators, $F_R$ (rigorous) and $F_C$ (constructive), each backed by a distinct language model and parameterized by a fixed system prompt:

F_R \colon \Pi \to \mathcal{L}, \quad F_R(\pi) = \texttt{GPT\text{-}5.4\text{-}Pro}(s_R,\, \pi)

(2)

F_C \colon \Pi \to \mathcal{L}, \quad F_C(\pi) = \texttt{Opus\text{-}4.6}(s_C,\, \pi)

(3)

Here $s_R, s_C \in \mathcal{L}$ are fixed system prompts that bias each model toward a distinct family of proof strategies.

Definition2.2(Strategy Families).

Let

\mathcal{S}

denote the space of proof strategies applicable to problems in

\Pi

. We distinguish two (not necessarily disjoint) families:

$\mathcal{S}_R \subseteq \mathcal{S}$ : deductive strategies (induction, contradiction, compactness arguments, applications of Szemerédi regularity)
$\mathcal{S}_C \subseteq \mathcal{S}$ : constructive strategies (probabilistic method, algebraic constructions, Lovász Local Lemma, entropy compression, container method)

The prompt $s_R$ biases $F_R$ toward outputs employing strategies in $\mathcal{S}_R$ , and similarly $s_C$ biases $F_C$ toward $\mathcal{S}_C$ . The families may overlap; the prompts encode a soft constraint, not a hard partition.

Both operators execute concurrently. Let $t_R, t_C > 0$ denote the wall-clock times for $F_R(\pi)$ and $F_C(\pi)$ respectively. The parallel execution yields a total inference time of $\max(t_R, t_C)$ rather than $t_R + t_C$ .

Each operator includes exponential backoff with jitter: upon receiving a rate-limit response (HTTP 429), the $i$ -th retry is delayed by $2^{i+1} + \xi$ seconds, where $\xi \sim \mathrm{Uniform}(0, 1)$ and $i \in \{0, 1, 2\}$ , giving delays in $[2,3], [4,5], [8,9]$ . If both operators fail after three attempts, the pipeline terminates with a diagnostic.

2.3 Synthesis Operator

Given the two proof attempts $P_R = F_R(\pi)$ and $P_C = F_C(\pi)$ , the synthesis operator $\Sigma$ constructs a combined argument. We model this as a constrained optimization:

\Sigma(\pi, P_R, P_C) = \arg\max_{P \in \mathcal{L}} \; \mu(P \mid \pi) \quad \text{s.t.} \quad \mathrm{supp}(P) \subseteq \mathrm{supp}(P_R) \cup \mathrm{supp}(P_C)

(4)

Definition2.3(Quality Measure and Support).

The quality measure

\mu \colon \mathcal{L} \times \Pi \to [0,1]

assigns a score to a proof

P

relative to a problem

\pi

, encoding logical validity, completeness, and parsimony. The support

\mathrm{supp}(P) \subseteq \mathcal{S}

is the set of proof strategies, lemmas, and techniques invoked by

P

In practice, the optimization in (4) is approximated by a single call to Claude Opus 4.6 acting as a referee. The synthesis prompt presents both $P_R$ and $P_C$ along with the original problem $\pi$ , and requests:

Identification of logical errors in $P_R$ and $P_C$ independently
Extraction of the strongest sub-arguments from each attempt
Composition into a single proof $\Sigma(\pi, P_R, P_C)$ with all gaps closed
Rendering of the result in publication-ready LaTeX

Remark2.4(Extended Thinking Budget).

The synthesis call allocates an extended thinking budget of

B_\Sigma = 65{,}536

tokens, compared to

B_F = 32{,}768

for the initial inference calls. This asymmetry reflects the observation that comparative analysis of two mathematical arguments requires deeper sustained reasoning than the generation of a single argument. When extended thinking is enabled, the temperature parameter is fixed at

T = 1.0

per API constraint.

Proposition2.5(Monotonicity of Synthesis).

Suppose

\mu(\cdot \mid \pi)

is monotone with respect to error correction: if

P

is obtained from

P'

by correcting a logical error while preserving all valid sub-arguments, then

\mu(P \mid \pi) \geq \mu(P' \mid \pi)

. Then

\mu\bigl(\Sigma(\pi, P_R, P_C) \mid \pi\bigr) \geq \max\bigl(\mu(P_R \mid \pi),\, \mu(P_C \mid \pi)\bigr)

2.4 Formal Verification

The verification stage maps natural-language proofs to the type-theoretic domain. We define two operators:

$\texttt{Aristotle} \colon \mathcal{L} \to \Lambda \cup \{\bot\}$ : the autoformalization operator, mapping natural-language proofs to Lean 4 terms ( $\Lambda$ ) or failure ( $\bot$ )
$\texttt{Lean4} \colon \Lambda \to \{\texttt{ok}, \texttt{err}\}$ : the Lean 4 type-checker

Definition2.6(Verification Predicate).

For a candidate proof

P \in \mathcal{L}

, define the verification predicate

V(P) = \begin{cases} 1 & \text{if } \texttt{Aristotle}(P) \neq \bot \;\text{ and }\; \texttt{Lean4}\bigl(\texttt{Aristotle}(P)\bigr) = \texttt{ok} \\ 0 & \text{otherwise} \end{cases}

(5)

The composite pipeline then yields the output pair

\Phi(\pi) = \Bigl(\,\Sigma\bigl(\pi,\, F_R(\pi),\, F_C(\pi)\bigr),\;\; V\bigl(\Sigma(\pi,\, F_R(\pi),\, F_C(\pi))\bigr)\,\Bigr)

(6)

Remark2.7(Soundness).

Verification is one-sided:

V(P) = 1

implies that

P

is a valid proof of

\pi

(under the assumption that the Lean 4 kernel is sound). However,

V(P) = 0

does not imply invalidity. The autoformalization operator may return

\bot

for arguments that rely on informal reasoning or results not yet in Mathlib.

2.5 Artifact Generation

Each run of $\Phi(\pi)$ produces a complete audit trail. Let $d$ be the execution date. The output directory outputs/{d}_problem{k}/ contains:

prompts.json: all system and user prompts $s_R, s_C$ with model parameters
gpt54pro_raw.md, opus46_raw.md: unmodified model outputs $P_R, P_C \in \mathcal{L}$
synthesis.md: the referee analysis and combined proof $\Sigma(\pi, P_R, P_C)$
proof.tex: self-contained LaTeX document with amsmath, amsthm preamble
formalization.lean: Lean 4 source (present when $\texttt{Aristotle}(P) \neq \bot$ )