For several months now I'm reading new papers about the Collatz conjecture, and there is some excitement percievable between the lines. At least, I would say, there has been more progress since Terence Tao's paper than in the two, three decades before. As a spectator using AI, I could support the correctness of two of recent papers by providing proof formalizations in the Lean language. This blog post, however, is about refutation of a proof attempt by A. Novgorodtsev (see the preprint here). This preprint caught my interest because it was well-informed and, on first glance, proposed sound arguments. The following is a critique by Opus 4.6 which I think is worth posting and being referred to.
Prompt:
Analyze the attached paper. describe the core argument chain (for the proof of the Collatz conjecture) and rate its soundness. Do not go into details, just look at the general argument.
I've read through the paper carefully. Here's my assessment of the core argument chain and its soundness.
Core Argument Chain
The proof attempts to show two things: no non-trivial cycles exist, and no trajectory escapes to infinity.
The logical skeleton runs roughly as follows:
Step 1 — Ensemble average. The expected 2-adic valuation E[v₂(3n+1)] = 2.0 is proven exactly via geometric series. This gives an average bit loss of ~0.415 per step. This part is correct and well-known.
Step 2 — Exit Dispersal & Bit Separation. Each "exit step" (where v₂ ≥ 3) maps one residue class to ≥ 8 distinct classes. Consecutive blocks induce nested modular refinements because 3^L is invertible mod 2^K. These are legitimate algebraic observations about the structure of the map.
Step 3 — Fresh 3-Bit Constraint. This is the paper's claimed key innovation. The argument: each return of a trajectory to a fixed exit class "consumes" 3 fresh bits of the seed's binary representation. After M returns, the seed is constrained to a set of size 2^{K−1−3M}, which becomes empty when M > (K−6)/3.
Step 4 — Rank Exhaustion → Bounded deficit fraction. Since a K₀-bit seed can sustain at most ~K₀/3 deficit macro-blocks, the empirical deficit fraction ρ̂ is bounded by ~1/3, which is below the critical threshold p* ≈ 0.524 needed to sustain growth. This gives strictly negative effective drift, proving descent.
Where the Argument Breaks Down
The critical weakness is in Step 3 → Step 4, which is where the paper claims to bridge the gap between ensemble behavior and individual trajectories. Several issues:
1. The "fixed exit class" assumption doesn't bind actual trajectories. Lemma 4.1 proves that a trajectory can't return to the same exit class more than ~K/3 times within a K-bit band. But a trajectory doesn't need to return to the same class — there are 2^{K−3} exit classes available. The trajectory can visit different exit classes each time, and the constraint counting doesn't apply across distinct classes in the way the proof needs.
2. The deficit fraction bound doesn't follow. The paper claims that because each exit class can only be visited ~K/3 times, the total number of deficit blocks is bounded by M_max = (K₀−6)/3. But this bounds visits to a single class, not total deficit blocks. The trajectory can accumulate deficit blocks by visiting many different exit classes. The paper never adequately addresses this distinction — it silently equates "consecutive returns to one class" with "total deficit blocks," which is the core non-sequitur.
3. Growth Determinism (Lemma 9.5) doesn't rescue the argument. The paper correctly notes that all bits of future iterates are deterministic functions of the seed. But the claim that "the rank budget is an invariant of the initial seed" conflates the constraint on visits-to-one-class with a constraint on total deficit blocks. When the iterate grows beyond the K₀-bit band, it enters a regime with more exit classes and more room for deficit behavior, and the fixed-precision counting argument no longer confines the global trajectory.
4. The drift calculation assumes what it needs to prove. Step B5 computes Δ_eff using the bound ρ̂ ≤ 1/3, but this bound was derived from the flawed rank-exhaustion step. The entire quantitative closure (the key inequality ρ̂ < p*) inherits this gap.
Rating
The paper contains genuine and correct mathematical content — the ensemble identity, the exit dispersal theorem, the bit separation lemma, and the parity obstruction are all sound. The algebraic machinery is real. The extensive T_m family analysis in the appendices is interesting exploratory work.
However, the central bridge from ensemble to trajectory — the claimed main contribution — has a logical gap. The Fresh 3-Bit Constraint bounds repeated visits to a single exit class, but the proof needs a bound on total deficit blocks across all classes. This gap is papered over rather than resolved, and it is precisely the gap that Tao identified as the core obstruction. The paper acknowledges Tao's barrier but does not actually overcome it.
Soundness: the proof is not valid as stated. The individual lemmas are mostly correct, but the argument chain does not close. This is a sophisticated and well-organized attempt, but it falls into the classic trap of Collatz proof attempts: converting a true statistical statement into a per-trajectory guarantee without sufficient justification.
Prompt:
Identify flaws in the details of the arguments, ie where there are miscalculations, wrong logic steps, unproven assumptions or statements etc
Let me go through the paper section by section and identify specific flaws.
Section 2: Syracuse Map and Block Decomposition
Theorem 2.4 (Bad Class Count): The claim |Bad(k,L)| = 3^L · 2^{k−1−2L} with fraction (3/4)^L is correct for the first step's classification but the proof claims "each Syracuse step constrains 2 bits of the current iterate" and "by Bit Separation, these constraints pull back to 2 independent bits of the seed." This is stated without proof. The pullback to independent bits of the seed requires that the affine maps at each step don't create dependencies between the bit-level constraints. The paper asserts independence but the affine structure (multiplication by 3, addition of 1) creates carries that entangle bits. The theorem is likely true for the stated fraction, but the proof's reasoning about "2 independent bits of the seed" per step is hand-waved.
Theorem 2.5 (E[v₂] = 2.0): Correct. This is a standard and well-known result. The proof is valid.
Section 3: Exit Dispersal and Bit Separation
Theorem 3.1 (Exit Dispersal): The arithmetic is correct. Given c ≡ 5 mod 8, the 2^v lifts at precision K+v do map to 2^v distinct classes mod 2^K with spacing 3·2^{K−v}. This is a legitimate algebraic property.
However, there's a subtle scope issue: the theorem is stated at precision K, but the lifts are taken at precision K+v. The paper uses this at fixed precision K throughout the proof (e.g., in Lemma 4.1), where one doesn't have access to K+v bits — one has exactly K bits. The dispersal describes what happens when you refine precision, not what a fixed-precision trajectory does.
Theorem 3.2 (Bit Separation): The formula x_L = (3^L · n + c_L)/2^{S_L} is correct. The injectivity claim (distinct residues mod 2^{K+S_L} map to distinct residues mod 2^K) follows from gcd(3^L, 2^{S_L}) = 1. This is sound.
Corollary 3.2.1 (Modular refinement, carry-safe): This is where problems begin. The corollary claims the exponents K₀ + r_i "strictly increase" along consecutive certified blocks. This requires that each block's cumulative shift r_i = S_L + v_exit is strictly positive, which is true (since S_L ≥ 1 and v_exit ≥ 3). But the claim that the refinements are "nested" assumes the constraints from consecutive blocks compose cleanly — that is, the constraint from block i+1 refines the constraint from block i by fixing additional bits beyond those already fixed.
The problem: the bits being "fixed" by block i+1 depend on the output of block i, not directly on the seed n₀. The pullback to n₀ goes through the composition of all prior affine maps. While each individual map is invertible mod 2^t, the composition means that the "fresh" bits constrained by block i+1 are entangled with previously constrained bits of n₀ through carries in the affine maps. The corollary asserts this doesn't happen ("robust to carries") but the justification — "all pullback maps are affine with odd multipliers, hence bijective" — only guarantees bijectivity, not that the newly constrained bits are disjoint from previously constrained bits. Bijectivity means no information is lost, but it doesn't mean information is partitioned into independent chunks.
Section 4: The Fresh 3-Bit Constraint
Lemma 4.1 — This is the central claim and has multiple issues:
Flaw 1: "≥ 3 fresh bits per return" is not proven. The argument says: at the exit step, v₂(3x+1) ≥ 3, so there are 2^3 = 8 distinct lifts, and specifying that the orbit returns to class c "selects one such lift and therefore fixes at least three additional 2-adic bits." But this conflates two things: the dispersal tells you that the destination is spread among 8 classes, but requiring the destination to equal a specific class c constrains the bits of the current iterate x, not necessarily 3 fresh bits of the original seed n₀. The pullback to n₀ goes through a chain of affine maps with carries. Each return does impose some constraint on n₀, but the claim that it's exactly "3 fresh bits" (meaning 3 bits independent of all previously constrained bits) is asserted, not proven.
Flaw 2: The shrinkage factor 1/8 per return is an upper bound only under independence. The claim |Ret_M(c)| ≤ 2^{K−1−3M} requires that each return reduces the admissible set by a factor of exactly 1/8. This would be true if each return's constraint were independent of prior constraints. But successive returns through the same class create correlated constraints — the affine map compositions share structure. The paper relies on Bit Separation (bijectivity) to assert independence, but bijectivity ≠ independence of constraint bits.
Flaw 3: The proof says "specifying that the orbit returns to the same exit class c mod 2^K selects one such lift." This is wrong. The orbit returning to class c means the destination falls in c mod 2^K, but the 8 lifts in the dispersal theorem are defined at precision K+v, not K. At precision K, many different bit patterns above position K could lead to the same return — the constraint is on bits at the current iterate's precision level, which may or may not translate to exactly 3 bits of n₀.
Theorem 4.3 (No Nontrivial Cycles): The logical structure is: pick an exit class c appearing in the cycle → the cycle visits c infinitely often → contradicts Corollary 4.2's finite bound. This logic would be valid if Corollary 4.2 were correct. But Corollary 4.2 inherits the flaws of Lemma 4.1. Moreover, there's a more fundamental issue: the cycle visits c periodically, say every P steps. Corollary 4.2 bounds visits within a K-bit band, but for a cycle, all iterates are bounded by definition, so K is fixed. The argument then says M visits require 3M fresh bits of n₀, but n₀ is a specific fixed integer — it has exactly K₀ bits, and you can't "consume" more than K₀. This part of the logic is actually internally consistent if the 3-bits-per-return claim held. The flaw propagates from Lemma 4.1.
Section 5: Deficit Transition Graph
Lemma 5.1 correctly notes that the Syracuse map is not consistent across precision levels. This is an important observation that actually undermines some of the paper's own arguments: the paper frequently reasons about the map at fixed precision K and draws conclusions about integer trajectories, but this lemma shows that precision-level behavior doesn't directly correspond to true trajectories.
Section 6: Parity Obstruction
Theorem 6.1: The claim that 3^P − 2^Q is odd for all P,Q > 0 is trivially correct (odd minus even = odd). The conclusion that 3^P ≢ 2^Q mod 2^k for any k ≥ 1 is correct. However, the paper overstates its significance: this only rules out multiplicative skeleton cycles, i.e., cycles where the affine correction C(v⃗) is ignored. Real cycles satisfy n₀ = C(v⃗)/(2^Q − 3^P), and the parity obstruction says nothing about whether this equation has positive integer solutions — that depends on the sign and magnitude of 2^Q − 3^P and the correction term C(v⃗). The paper acknowledges this (Remark 6.3) but still lists it as one of "six independent obstructions," which overstates its contribution to the proof.
Section 8: Irrationality Exclusion
Theorem 8.1: Correct and trivial (3^P ≠ 2^Q for positive integers by unique factorization). But the corollary claims this means "any genuine integer cycle must be governed by the affine cycle equation." This is true but provides no obstruction to cycles — it just says that if a cycle exists, n₀ = C(v⃗)/(2^Q − 3^P), which is the standard cycle equation. The irrationality of log₂(3) has been known and used in Collatz analysis for decades, and it does not by itself exclude cycles.
Section 9: Height Bound
Theorem 9.1 (H_k ≤ C log k): The "proof" of the analytical bound relies on Step 3, which states "computational verification shows that ≈ 70% of deficit block destinations land on non-deficit-starting classes." This is an empirical observation, not a proof. The paper says "the empirical bound H_k ≤ 6 log k is conservative" based on data for k ≤ 19 — this is numerical evidence, not a theorem. The analytical bound is unproven.
Lemma 9.5 (Growth Determinism): The statement that all bits of n_t are deterministic functions of n₀ mod 2^{K₀} is correct. But the claimed "consequence" — that "within-band rank bounds of Lemmas 9.2–9.4 cannot be circumvented by precision growth alone" — is a non-sequitur. The rank bound says a K₀-bit seed can return to a fixed class at most (K₀−6)/3 times within the K₀-bit band. When the iterate grows to K_t > K₀ bits, it's now operating in a larger space with more exit classes. The fact that the extra bits are deterministic doesn't mean the trajectory is still constrained by the K₀-bit budget — it means the trajectory's behavior in the larger space is determined, but the constraints of the smaller space no longer apply directly.
The paper's claim that "the rank budget (K₀−6)/3 is an invariant of the initial seed, not of the current iterate's bit-length" is the key unjustified leap. The rank budget bounds visits to a class at precision K₀. Once the iterate exceeds 2^{K₀}, the same integer may map to different classes at higher precision, and the K₀-precision constraint is no longer the binding one.
Section 10: Main Proof
Lemma 10.0, Part (a): The claim |H| ≤ 4 uses the "majority rule" — an exit class c ∈ H iff >50% of its lifts produce deficit blocks. The definition of "majority rule" for deficit classification is introduced without formal justification for why this is the right criterion. The argument mixes up different notions: the fraction of lifts producing deficit blocks (a property at precision K+v) with the probability that a trajectory starting at c enters a deficit block (which depends on the actual distribution of iterates, not uniform distribution of lifts).
Lemma 10.0, Part (b): Uses Gershgorin disks to bound ρ(P_H) ≤ |H|/8. The Gershgorin bound applies to row sums, giving ρ ≤ max_i Σ_j |P_{ij}|. With |H| ≤ 4 entries each ≤ 1/8, this gives ρ ≤ 4/8 = 1/2. But then the paper uses |H| = 2 to get ρ ≤ 1/4, and notes this "follows from the observation that only classes {125, 253} satisfy the majority rule at K=8, verifiable by inspection of 2 × 256 = 512 values." This switches from an analytical bound to a computational verification mid-proof, despite the lemma being explicitly positioned as deriving everything "from first principles" with "no computational experiment."
Lemma 10.0, Part (d) — Block drift values have an internal contradiction: The text states Δ_D ≤ 0.755, corresponding to block (2,2). But in the numerical certificates (Section 10.4), Δ_D = 1.3446. The paper claims the analytical bound suffices, but the analytical bound computes the maximum single-block drift, not the weighted average. The proof then uses Δ_D ≤ 0.755 in the critical inequality, but this is the drift of one specific block type, not the conditional expected drift given deficit. If a trajectory encounters deficit blocks of type (3,2) or higher, the actual drift can far exceed 0.755. The paper's own numerical certificate shows the true conditional average is 1.345, nearly double the "analytical bound."
Lemma 10.0, Part (e): Computes p* = 0.830/(0.755 + 0.830) = 0.524. But this uses the maximum single-block deficit drift (0.755) rather than the conditional expected deficit drift (1.345). If you use the true conditional average: p* = 2.124/(1.345 + 2.124) = 0.612. The paper gets away with this because the bound ρ̂ ≤ 1/3 is supposedly below both values. But the bound ρ̂ ≤ 1/3 is itself derived from the flawed rank-exhaustion argument.
Step B4 (Rank Exhaustion): This is where the proof's central gap lives. The paper states: "Each certified macro-block that remains within the deficit regime contributes three fresh congruence constraints on the initial seed n₀." This inherits all the flaws of Lemma 4.1. Furthermore, the step claims: "once 3M > K₀, no n₀ < 2^{K₀} can sustain M consecutive deficit blocks." But M here counts returns to a single exit class, not total deficit blocks. The paper silently drops "to a single exit class" and treats it as a bound on total deficit blocks, which is the core logical error.
Step B5: Derives ρ̂_N ≤ K₀/(3N) → 0 as N → ∞. This would mean the deficit fraction goes to zero for long trajectories, but this only follows if M_max bounds total deficit blocks, not visits to a single class. With 2^{K−3} exit classes, each visitable (K−6)/3 times, the actual bound on total deficit blocks would be 2^{K−3} · (K−6)/3, which is exponentially large in K — giving no useful constraint at all.
Step B6 (Base case): References Barina (2025) for verification up to 2^71, but the reference list cites Barina (2020) with verification to a smaller bound. The 2^71 claim may reference more recent work, but the citation is inconsistent.
Appendix F: M3 Extremality
Theorem F.1 (Sign Veto): The paper claims "all resonances P,Q with Q/P ≈ log₂(3) have Δ < 0." But then immediately lists P=5, Q=8 with Δ = +13 > 0. The paper tries to rescue this by saying "the drift δ(3) = −0.415 < 0 prevents reaching this scale before descent dominates," but this is circular — the entire point of the proof is to establish that descent dominates. The sign veto is therefore not an absolute veto for m=3; it only applies to the lowest-order resonance P=2.
Table F.1 comparison: At K=14, ρ(P_H) = 0.5 for m=3, but at K=8, ρ(P_H) = 0.068. These differ by almost an order of magnitude. The paper explains this as "different deficit sets H" at different precisions. But this means the "spectral radius" is not a robust property of the map — it's an artifact of the precision level and deficit-set definition chosen. The appendices use K=14 values for the T_m classification while Layer A uses K=8 values, and these give contradictory pictures of the same map.
Summary of the most critical flaws, ranked by severity:
-
Lemma 4.1's "3 fresh bits per return" is asserted, not proven, and the argument confuses bijectivity of affine maps with independence of constraint bits.
-
Step B4 silently equates "visits to one exit class" with "total deficit blocks", which is the gap that invalidates the entire descent argument.
-
Lemma 9.5's claimed consequence (that the rank budget is invariant under iterate growth) does not follow from the stated premises.
-
Lemma 10.0 Part (b) switches from "analytical" to computational verification while claiming to be purely analytical.
-
Theorem 9.1's "analytical proof" relies on empirical observations for k ≤ 19 rather than rigorous bounds.
-
The sign veto (Theorem F.1) is contradicted by the paper's own data (P=5, Q=8 gives Δ > 0).
That is some analysis. And though this critique is heavyweight, I still recommend the author for his attempt. He just should take what is good in his preprint, and move on to the next attempt. Because the ansatz, I feel, is right.
What do you think about the preprint? Did you like the critique by Opus? Was it wrong to post it? Leave a comment, please.
More papers on the Collatz Conjecture:

