Unsexy Science

2026-04-01

A Failed Attempt at the Collatz

For several months now I'm reading new papers about the Collatz conjecture, and there is some excitement percievable between the lines. At least, I would say, there has been more progress since Terence Tao's paper than in the two, three decades before. As a spectator using AI, I could support the correctness of two of recent papers by providing proof formalizations in the Lean language. This blog post, however, is about refutation of a proof attempt by A. Novgorodtsev (see the preprint here). This preprint caught my interest because it was well-informed and, on first glance, proposed sound arguments. The following is a critique by Opus 4.6 which I think is worth posting and being referred to.

Prompt:
Analyze the attached paper. describe the core argument chain (for the proof of the Collatz conjecture) and rate its soundness. Do not go into details, just look at the general argument.

I've read through the paper carefully. Here's my assessment of the core argument chain and its soundness.

Core Argument Chain

The proof attempts to show two things: no non-trivial cycles exist, and no trajectory escapes to infinity.

The logical skeleton runs roughly as follows:

Step 1 — Ensemble average. The expected 2-adic valuation E[v₂(3n+1)] = 2.0 is proven exactly via geometric series. This gives an average bit loss of ~0.415 per step. This part is correct and well-known.

Step 2 — Exit Dispersal & Bit Separation. Each "exit step" (where v₂ ≥ 3) maps one residue class to ≥ 8 distinct classes. Consecutive blocks induce nested modular refinements because 3^L is invertible mod 2^K. These are legitimate algebraic observations about the structure of the map.

Step 3 — Fresh 3-Bit Constraint. This is the paper's claimed key innovation. The argument: each return of a trajectory to a fixed exit class "consumes" 3 fresh bits of the seed's binary representation. After M returns, the seed is constrained to a set of size 2^{K−1−3M}, which becomes empty when M > (K−6)/3.

Step 4 — Rank Exhaustion → Bounded deficit fraction. Since a K₀-bit seed can sustain at most ~K₀/3 deficit macro-blocks, the empirical deficit fraction ρ̂ is bounded by ~1/3, which is below the critical threshold p* ≈ 0.524 needed to sustain growth. This gives strictly negative effective drift, proving descent.

Where the Argument Breaks Down

The critical weakness is in Step 3 → Step 4, which is where the paper claims to bridge the gap between ensemble behavior and individual trajectories. Several issues:

1. The "fixed exit class" assumption doesn't bind actual trajectories. Lemma 4.1 proves that a trajectory can't return to the same exit class more than ~K/3 times within a K-bit band. But a trajectory doesn't need to return to the same class — there are 2^{K−3} exit classes available. The trajectory can visit different exit classes each time, and the constraint counting doesn't apply across distinct classes in the way the proof needs.

2. The deficit fraction bound doesn't follow. The paper claims that because each exit class can only be visited ~K/3 times, the total number of deficit blocks is bounded by M_max = (K₀−6)/3. But this bounds visits to a single class, not total deficit blocks. The trajectory can accumulate deficit blocks by visiting many different exit classes. The paper never adequately addresses this distinction — it silently equates "consecutive returns to one class" with "total deficit blocks," which is the core non-sequitur.

3. Growth Determinism (Lemma 9.5) doesn't rescue the argument. The paper correctly notes that all bits of future iterates are deterministic functions of the seed. But the claim that "the rank budget is an invariant of the initial seed" conflates the constraint on visits-to-one-class with a constraint on total deficit blocks. When the iterate grows beyond the K₀-bit band, it enters a regime with more exit classes and more room for deficit behavior, and the fixed-precision counting argument no longer confines the global trajectory.

4. The drift calculation assumes what it needs to prove. Step B5 computes Δ_eff using the bound ρ̂ ≤ 1/3, but this bound was derived from the flawed rank-exhaustion step. The entire quantitative closure (the key inequality ρ̂ < p*) inherits this gap.

Rating

The paper contains genuine and correct mathematical content — the ensemble identity, the exit dispersal theorem, the bit separation lemma, and the parity obstruction are all sound. The algebraic machinery is real. The extensive T_m family analysis in the appendices is interesting exploratory work.

However, the central bridge from ensemble to trajectory — the claimed main contribution — has a logical gap. The Fresh 3-Bit Constraint bounds repeated visits to a single exit class, but the proof needs a bound on total deficit blocks across all classes. This gap is papered over rather than resolved, and it is precisely the gap that Tao identified as the core obstruction. The paper acknowledges Tao's barrier but does not actually overcome it.

Soundness: the proof is not valid as stated. The individual lemmas are mostly correct, but the argument chain does not close. This is a sophisticated and well-organized attempt, but it falls into the classic trap of Collatz proof attempts: converting a true statistical statement into a per-trajectory guarantee without sufficient justification.

Prompt:

Identify flaws in the details of the arguments, ie where there are miscalculations, wrong logic steps, unproven assumptions or statements etc

Let me go through the paper section by section and identify specific flaws.

Section 2: Syracuse Map and Block Decomposition

Theorem 2.4 (Bad Class Count): The claim |Bad(k,L)| = 3^L · 2^{k−1−2L} with fraction (3/4)^L is correct for the first step's classification but the proof claims "each Syracuse step constrains 2 bits of the current iterate" and "by Bit Separation, these constraints pull back to 2 independent bits of the seed." This is stated without proof. The pullback to independent bits of the seed requires that the affine maps at each step don't create dependencies between the bit-level constraints. The paper asserts independence but the affine structure (multiplication by 3, addition of 1) creates carries that entangle bits. The theorem is likely true for the stated fraction, but the proof's reasoning about "2 independent bits of the seed" per step is hand-waved.

Theorem 2.5 (E[v₂] = 2.0): Correct. This is a standard and well-known result. The proof is valid.

Section 3: Exit Dispersal and Bit Separation

Theorem 3.1 (Exit Dispersal): The arithmetic is correct. Given c ≡ 5 mod 8, the 2^v lifts at precision K+v do map to 2^v distinct classes mod 2^K with spacing 3·2^{K−v}. This is a legitimate algebraic property.

However, there's a subtle scope issue: the theorem is stated at precision K, but the lifts are taken at precision K+v. The paper uses this at fixed precision K throughout the proof (e.g., in Lemma 4.1), where one doesn't have access to K+v bits — one has exactly K bits. The dispersal describes what happens when you refine precision, not what a fixed-precision trajectory does.

Theorem 3.2 (Bit Separation): The formula x_L = (3^L · n + c_L)/2^{S_L} is correct. The injectivity claim (distinct residues mod 2^{K+S_L} map to distinct residues mod 2^K) follows from gcd(3^L, 2^{S_L}) = 1. This is sound.

Corollary 3.2.1 (Modular refinement, carry-safe): This is where problems begin. The corollary claims the exponents K₀ + r_i "strictly increase" along consecutive certified blocks. This requires that each block's cumulative shift r_i = S_L + v_exit is strictly positive, which is true (since S_L ≥ 1 and v_exit ≥ 3). But the claim that the refinements are "nested" assumes the constraints from consecutive blocks compose cleanly — that is, the constraint from block i+1 refines the constraint from block i by fixing additional bits beyond those already fixed.

The problem: the bits being "fixed" by block i+1 depend on the output of block i, not directly on the seed n₀. The pullback to n₀ goes through the composition of all prior affine maps. While each individual map is invertible mod 2^t, the composition means that the "fresh" bits constrained by block i+1 are entangled with previously constrained bits of n₀ through carries in the affine maps. The corollary asserts this doesn't happen ("robust to carries") but the justification — "all pullback maps are affine with odd multipliers, hence bijective" — only guarantees bijectivity, not that the newly constrained bits are disjoint from previously constrained bits. Bijectivity means no information is lost, but it doesn't mean information is partitioned into independent chunks.

Section 4: The Fresh 3-Bit Constraint

Lemma 4.1 — This is the central claim and has multiple issues:

Flaw 1: "≥ 3 fresh bits per return" is not proven. The argument says: at the exit step, v₂(3x+1) ≥ 3, so there are 2^3 = 8 distinct lifts, and specifying that the orbit returns to class c "selects one such lift and therefore fixes at least three additional 2-adic bits." But this conflates two things: the dispersal tells you that the destination is spread among 8 classes, but requiring the destination to equal a specific class c constrains the bits of the current iterate x, not necessarily 3 fresh bits of the original seed n₀. The pullback to n₀ goes through a chain of affine maps with carries. Each return does impose some constraint on n₀, but the claim that it's exactly "3 fresh bits" (meaning 3 bits independent of all previously constrained bits) is asserted, not proven.

Flaw 2: The shrinkage factor 1/8 per return is an upper bound only under independence. The claim |Ret_M(c)| ≤ 2^{K−1−3M} requires that each return reduces the admissible set by a factor of exactly 1/8. This would be true if each return's constraint were independent of prior constraints. But successive returns through the same class create correlated constraints — the affine map compositions share structure. The paper relies on Bit Separation (bijectivity) to assert independence, but bijectivity ≠ independence of constraint bits.

Flaw 3: The proof says "specifying that the orbit returns to the same exit class c mod 2^K selects one such lift." This is wrong. The orbit returning to class c means the destination falls in c mod 2^K, but the 8 lifts in the dispersal theorem are defined at precision K+v, not K. At precision K, many different bit patterns above position K could lead to the same return — the constraint is on bits at the current iterate's precision level, which may or may not translate to exactly 3 bits of n₀.

Theorem 4.3 (No Nontrivial Cycles): The logical structure is: pick an exit class c appearing in the cycle → the cycle visits c infinitely often → contradicts Corollary 4.2's finite bound. This logic would be valid if Corollary 4.2 were correct. But Corollary 4.2 inherits the flaws of Lemma 4.1. Moreover, there's a more fundamental issue: the cycle visits c periodically, say every P steps. Corollary 4.2 bounds visits within a K-bit band, but for a cycle, all iterates are bounded by definition, so K is fixed. The argument then says M visits require 3M fresh bits of n₀, but n₀ is a specific fixed integer — it has exactly K₀ bits, and you can't "consume" more than K₀. This part of the logic is actually internally consistent if the 3-bits-per-return claim held. The flaw propagates from Lemma 4.1.

Section 5: Deficit Transition Graph

Lemma 5.1 correctly notes that the Syracuse map is not consistent across precision levels. This is an important observation that actually undermines some of the paper's own arguments: the paper frequently reasons about the map at fixed precision K and draws conclusions about integer trajectories, but this lemma shows that precision-level behavior doesn't directly correspond to true trajectories.

Section 6: Parity Obstruction

Theorem 6.1: The claim that 3^P − 2^Q is odd for all P,Q > 0 is trivially correct (odd minus even = odd). The conclusion that 3^P ≢ 2^Q mod 2^k for any k ≥ 1 is correct. However, the paper overstates its significance: this only rules out multiplicative skeleton cycles, i.e., cycles where the affine correction C(v⃗) is ignored. Real cycles satisfy n₀ = C(v⃗)/(2^Q − 3^P), and the parity obstruction says nothing about whether this equation has positive integer solutions — that depends on the sign and magnitude of 2^Q − 3^P and the correction term C(v⃗). The paper acknowledges this (Remark 6.3) but still lists it as one of "six independent obstructions," which overstates its contribution to the proof.

Section 8: Irrationality Exclusion

Theorem 8.1: Correct and trivial (3^P ≠ 2^Q for positive integers by unique factorization). But the corollary claims this means "any genuine integer cycle must be governed by the affine cycle equation." This is true but provides no obstruction to cycles — it just says that if a cycle exists, n₀ = C(v⃗)/(2^Q − 3^P), which is the standard cycle equation. The irrationality of log₂(3) has been known and used in Collatz analysis for decades, and it does not by itself exclude cycles.

Section 9: Height Bound

Theorem 9.1 (H_k ≤ C log k): The "proof" of the analytical bound relies on Step 3, which states "computational verification shows that ≈ 70% of deficit block destinations land on non-deficit-starting classes." This is an empirical observation, not a proof. The paper says "the empirical bound H_k ≤ 6 log k is conservative" based on data for k ≤ 19 — this is numerical evidence, not a theorem. The analytical bound is unproven.

Lemma 9.5 (Growth Determinism): The statement that all bits of n_t are deterministic functions of n₀ mod 2^{K₀} is correct. But the claimed "consequence" — that "within-band rank bounds of Lemmas 9.2–9.4 cannot be circumvented by precision growth alone" — is a non-sequitur. The rank bound says a K₀-bit seed can return to a fixed class at most (K₀−6)/3 times within the K₀-bit band. When the iterate grows to K_t > K₀ bits, it's now operating in a larger space with more exit classes. The fact that the extra bits are deterministic doesn't mean the trajectory is still constrained by the K₀-bit budget — it means the trajectory's behavior in the larger space is determined, but the constraints of the smaller space no longer apply directly.

The paper's claim that "the rank budget (K₀−6)/3 is an invariant of the initial seed, not of the current iterate's bit-length" is the key unjustified leap. The rank budget bounds visits to a class at precision K₀. Once the iterate exceeds 2^{K₀}, the same integer may map to different classes at higher precision, and the K₀-precision constraint is no longer the binding one.

Section 10: Main Proof

Lemma 10.0, Part (a): The claim |H| ≤ 4 uses the "majority rule" — an exit class c ∈ H iff >50% of its lifts produce deficit blocks. The definition of "majority rule" for deficit classification is introduced without formal justification for why this is the right criterion. The argument mixes up different notions: the fraction of lifts producing deficit blocks (a property at precision K+v) with the probability that a trajectory starting at c enters a deficit block (which depends on the actual distribution of iterates, not uniform distribution of lifts).

Lemma 10.0, Part (b): Uses Gershgorin disks to bound ρ(P_H) ≤ |H|/8. The Gershgorin bound applies to row sums, giving ρ ≤ max_i Σ_j |P_{ij}|. With |H| ≤ 4 entries each ≤ 1/8, this gives ρ ≤ 4/8 = 1/2. But then the paper uses |H| = 2 to get ρ ≤ 1/4, and notes this "follows from the observation that only classes {125, 253} satisfy the majority rule at K=8, verifiable by inspection of 2 × 256 = 512 values." This switches from an analytical bound to a computational verification mid-proof, despite the lemma being explicitly positioned as deriving everything "from first principles" with "no computational experiment."

Lemma 10.0, Part (d) — Block drift values have an internal contradiction: The text states Δ_D ≤ 0.755, corresponding to block (2,2). But in the numerical certificates (Section 10.4), Δ_D = 1.3446. The paper claims the analytical bound suffices, but the analytical bound computes the maximum single-block drift, not the weighted average. The proof then uses Δ_D ≤ 0.755 in the critical inequality, but this is the drift of one specific block type, not the conditional expected drift given deficit. If a trajectory encounters deficit blocks of type (3,2) or higher, the actual drift can far exceed 0.755. The paper's own numerical certificate shows the true conditional average is 1.345, nearly double the "analytical bound."

Lemma 10.0, Part (e): Computes p* = 0.830/(0.755 + 0.830) = 0.524. But this uses the maximum single-block deficit drift (0.755) rather than the conditional expected deficit drift (1.345). If you use the true conditional average: p* = 2.124/(1.345 + 2.124) = 0.612. The paper gets away with this because the bound ρ̂ ≤ 1/3 is supposedly below both values. But the bound ρ̂ ≤ 1/3 is itself derived from the flawed rank-exhaustion argument.

Step B4 (Rank Exhaustion): This is where the proof's central gap lives. The paper states: "Each certified macro-block that remains within the deficit regime contributes three fresh congruence constraints on the initial seed n₀." This inherits all the flaws of Lemma 4.1. Furthermore, the step claims: "once 3M > K₀, no n₀ < 2^{K₀} can sustain M consecutive deficit blocks." But M here counts returns to a single exit class, not total deficit blocks. The paper silently drops "to a single exit class" and treats it as a bound on total deficit blocks, which is the core logical error.

Step B5: Derives ρ̂_N ≤ K₀/(3N) → 0 as N → ∞. This would mean the deficit fraction goes to zero for long trajectories, but this only follows if M_max bounds total deficit blocks, not visits to a single class. With 2^{K−3} exit classes, each visitable (K−6)/3 times, the actual bound on total deficit blocks would be 2^{K−3} · (K−6)/3, which is exponentially large in K — giving no useful constraint at all.

Step B6 (Base case): References Barina (2025) for verification up to 2^71, but the reference list cites Barina (2020) with verification to a smaller bound. The 2^71 claim may reference more recent work, but the citation is inconsistent.

Appendix F: M3 Extremality

Theorem F.1 (Sign Veto): The paper claims "all resonances P,Q with Q/P ≈ log₂(3) have Δ < 0." But then immediately lists P=5, Q=8 with Δ = +13 > 0. The paper tries to rescue this by saying "the drift δ(3) = −0.415 < 0 prevents reaching this scale before descent dominates," but this is circular — the entire point of the proof is to establish that descent dominates. The sign veto is therefore not an absolute veto for m=3; it only applies to the lowest-order resonance P=2.

Table F.1 comparison: At K=14, ρ(P_H) = 0.5 for m=3, but at K=8, ρ(P_H) = 0.068. These differ by almost an order of magnitude. The paper explains this as "different deficit sets H" at different precisions. But this means the "spectral radius" is not a robust property of the map — it's an artifact of the precision level and deficit-set definition chosen. The appendices use K=14 values for the T_m classification while Layer A uses K=8 values, and these give contradictory pictures of the same map.

Summary of the most critical flaws, ranked by severity:

Lemma 4.1's "3 fresh bits per return" is asserted, not proven, and the argument confuses bijectivity of affine maps with independence of constraint bits.
Step B4 silently equates "visits to one exit class" with "total deficit blocks", which is the gap that invalidates the entire descent argument.
Lemma 9.5's claimed consequence (that the rank budget is invariant under iterate growth) does not follow from the stated premises.
Lemma 10.0 Part (b) switches from "analytical" to computational verification while claiming to be purely analytical.
Theorem 9.1's "analytical proof" relies on empirical observations for k ≤ 19 rather than rigorous bounds.
The sign veto (Theorem F.1) is contradicted by the paper's own data (P=5, Q=8 gives Δ > 0).

That is some analysis. And though this critique is heavyweight, I still recommend the author for his attempt. He just should take what is good in his preprint, and move on to the next attempt. Because the ansatz, I feel, is right.

What do you think about the preprint? Did you like the critique by Opus? Was it wrong to post it? Leave a comment, please.

More papers on the Collatz Conjecture:

https://ccchallenge.org/

2022-10-12

Is there 5-Star Linked Open Metadata of mass spectrometry datasets?

My recent interest in mass spectrometry (MS) open data followed logically after I played with analyzing open metagenomics datasets. Both MS and metagenomics can be applied in an untargeted mode, and with a serendipitous mindset, yielding data that frequently contains unexpected results, solely depending on the questions asked.

In the case of metagenomics I used the Kraken software ([1]) to find unexpected species in sequenced samples: old versions of invertebrate genomes deposited at NCBI RefSeq turned out to be contaminated with bacterial DNA (unpublished data at [2]); thankfully RefSeq submissions are now screened for this and it can no longer happen. In another investigation I probed human samples for specific fungi ([3]), following up on a study that suggested the fungi's involvement in Alzheimer's disease.

If the promise of open mass spectrometry datasets were only taxonomic identification of samples, through finding taxon-specific peptides and metabolites, it would be interesting enough, for example looking at clinical data and scanning for possible signals of unexpected infections. But untargeted metabolomics datasets have much more, namely >90% spectra of unidentified molecules which might turn out to be completely unseen before. The crux is that it is so difficult to go from MS spectrum to structural identification. At the moment (2022) this is impossible to do in bulk, in an automated way ([4]).

Having open datasets is a prerequisite to even begin analysis of MS spectra, but even more important, especially with MS, is the information about the experiments, the metadata. It is highly structured, specialized, and voluminous due to the number of experiments that can be performed in short time using a mass spectrometer. Accurate and open metadata is a necessity for analyzing the growing number of deposited datasets. This leads to our question: what is the state of Linked Open Metadata of mass spectrometry datasets?

Wikipedia has the definition of different qualities of Open Data([5]):

Tim Berners-Lee has suggested a 5-star scheme for grading the quality of open data on the web, for which the highest ranking is Linked Open Data:^[10]

1 star: data is openly available in some format.
2 stars: data is available in a structured format, such as Microsoft Excel file format (.xls).
3 stars: data is available in a non-proprietary structured format, such as Comma-separated values (.csv).
4 stars: data follows W3C standards, like using RDF and employing URIs.
5 stars: all of the other, plus links to other Linked Open Data sources.

In this spirit I will give a list of open MS dataset repositories, together with the star rank of their metadata that I encountered in October 2022. Study-wide metadata usually contains sample-wide and experiment-wide metadata. The following entries are in no particular order.

Study data

★★★☆☆ MetaboLights: XML available but it is unclear if this is complete, as many original datasets aren't converted to open formats.

★★★☆☆ massive.ucsd.edu: Export of limited structured study-metadata from searches. See below.

★★★☆☆ redu.ucsd.edu: More extensive and curated study-/sample-/experiment-metadata of Massive datasets. The caveat of unconverted datasets applies as well.

☆☆☆☆☆ metabolomicsworkbench.org: no metadata download. They require only minimum metadata on upload, anyway. The caveat of unconverted datasets applies as well.

Spectrum Libraries

★★★☆☆ mona.fiehnlab.ucdavis.edu: Structured data of compound-spectrum-experiment aggregates, meaning limited metadata is copied to each spectrum/compound entry. Limited to identified spectra but includes in silico models.

☆☆☆☆☆: gmd.mpimp-golm.mpg.de: spectra download possible, but no metadata associated with spectra

☆☆☆☆☆: HMDB: the experimental spectra file download link gave me a 502 Bad Gateway

☆☆☆☆☆: mzcloud: no metadata download

☆☆☆☆☆: metlin.scripps.edu: no metadata download

☆☆☆☆☆: metabolome-express.org: site unreachable

☆☆☆☆☆: mmcd.nmrfam.wisc.edu: site unreachable

Conclusion

There is no 5-Star Linked Open Metadata of mass spectrometry datasets. MetaboLights and ReDU clearly are dedicated to provide metadata in structured format with the datasets, the same applies to MoNA doing a great job associating metadata with single spectra. The importance and the promise of having these associations is convincingly shown in the ReDU ([6]) and MoNA ([7]) papers. It remains to be seen if it will be possible to extract full metadata from the datasets deposited at Metabolomics Workbench.

2022-Oct-12

Ralf Stephan, developer and biocurator

2022-02-12

Why you won't play a relaxed game of chess with Alexa in the foreseeable future

The promise of automatic speech recognition (ASR) providing a hands-free experience is fulfilled in countless specialist PC software products, for example dictation software. After starting up, dictation software does not need a wake-word for doing its job, even after minutes of silence. Neither is it the case with military speech recognition---imagine a fighter pilot needing to speak a wake-word in a high-risk situation. So why does Alexa (or, for that matter Google Assistant and Siri) have wake-words?

But it's not the wake-word only. If you wait too long giving a reply to a question from Alexa, or giving a second command associated with a previous command, Alexa will stop listening, making a blip sound, and will require you to say the wake-word again, before listening again. The time Alexa listens to your silence is fixed to eight seconds (same with Google). Isn't this annoying? The Follow-Up mode implemented in Alexa in 2021 does not remove this restriction, it just removes the requirement for a wake-word within those eight seconds.

The difference between PC or military ASR and virtual assistants is manyfold:

the former run on a PC or embedded system, and are used in a workplace setting. Most assistants are installed at home. However, dictation software is also used by disabled people to control the PC in a home setting, but this is minor usage if you count the numbers.
speech-to-text transformation of specialist ASR happens completely in the PC/device. In contrast virtual assistants transfer speech to the Cloud for processing. This is a legal nightmare for privacy reasons. But apparently the central processing and the reduced need for software updates in devices make this system design attractive despite the legal minefields.
Alexa in particular allows skill distribution by external developers in the Amazon Skill Store, much like apps for mobile devices. However, any mobile app using the microphone has to be given explicit permission by the user, while Alexa uses the microphone per default unless explicitly turned off. So Amazon just assumes the worst case and avoids legal problems by making Alexa strictly listen for eight seconds in all skills.

You can probably see now why you never will play a game of chess using Alexa, sitting on your sofa in front of a big TV showing a chess board, and leisurely moving pieces by saying "move pawn to e4". After you see the move of your opponent you ponder for minutes and say "take the pawn". And you can say "pause" or "quit game" after minutes and Alexa will know what you mean.

No. It will always be like this: The opponent moves. You ponder. After eight seconds you here a faint BLIP. Now you need to say "Alexa, open chess game and take pawn". For every of your moves that takes longer than eight seconds to ponder. Because the device needs to hear the wake-word to start listening again, and because the skill context has been lost.

And that is very unsexy for any developer trying to provide services to the user. I wanted to write a chess skill for Alexa to help me or other physically disabled persons. But now I'm starting to think about PC-centered solutions again.

2018-02-15

This is Chess960 position #310, the prototype of the hanging-rook motive. After 1.c3 Black has only 1..Ng6 or 1..g6. The latter leaves the knight in the corner. A possible continuation: 2.d4 Bg7 3.e4 c5 4.Ng3 cxd4 5.cxd4 O-O 6.d5 and Black is cramped. It looks like only 1..Ng6 will lead to balanced positions, after some forced moves. After 2.h4 h5 3.d4 c6 4.Ng3 Black will want to break symmetry with 4..e5. After 5.Bg5 Be7 6.Bxe7 Nxe7 the initial problems are resolved.

After 6...Nxe7

This means that there is at least one Chess960 starting position that, with best play by White, forces Black tactically to make specific moves. So if you play the Fischer chess variant you need to know these positions.

2015-10-22

Survey: Sage and Enumerative Combinatorics

The project I'm helping out with for nearly two years now is Sage Math, which has 700k lines of Python code that glue about a hundred open source math software packages into one tool conglomerate. My mathematical interest was always discrete math, and the recent developments in symbolic computation fueling the new experimental mathematics fascinate me especially. This made me naturally gravitating towards advancing symbolics in Sage which, I regret to say, is in a poor (unsexy) state because most main developers of Sage are interested in abstract algebra and number theory, but less in enumerative combinatorics, nor in experimental symbolics, or even calculus.

Experimental symbolics is very successful in discrete math, especially enumerative combinatorics. The natural question arises, how far can Sage help with this branch of symbolics? To this end I present a table of respective mathematical objects and algorithms, and the support Sage has for them. I'm leaning heavily on the recent summary of computer algebra relevant for enumerative combinatorics by Manuel Kauers (published in Bona's new Handbook of Combinatorics).

Sage capability survey (Fall 2015)
Computation in/with	Status	Comments
Finite fields	✓	Documentation
Lattice reduction	✓	Documentation
Multivariate polynomials	✓	Documentation
Gröbner bases	✓	Documentation
Algebraic number arithmetic	✓	Documentation
Cylindrical Algebraic Decomposition	✓	Documentation (from Sage version 6.10.p2 up)
Formal power series	✓	Two implementations, a fast one missing most symbolic function expansions, and a slower one with function expansions, but neglected having many bugs. Both not interoperating.---Documentation1, Documentation2
Lazy power series	✓	rudimentary---Documentation
Laurent series	✗	Only univariate available
Puiseux series	✗
Ore algebras	✓	optional package ore-algebra
C-finite sequences	✓	Documentation
D-finite sequences	✗
Combinatorial species	✓	Documentation
Omega analysis (partitions)	✗
Ehrhart theory	✗	incomplete, in progress
Computational group theory	✓	available via GAP
Symbolic summation: Gosper's algorithm	✓	part of sum(), available via Maxima
Zeilberger's algorithm	✓	part of sum(), available via Maxima
Petkovšek's algorithm	✗
Karr's algorithm	✗

Creative telescoping	✗
ΠΣ-theory	✗
Holonomic functions	✗

2013-09-21

Random 100 sequences from the OEIS---a survey.

Summary: there were 16 holonomic, 16 prime, 11 digital, 7 constants, 4 arbitrary, 28 number theoretic, 13 combinatorial, four group theoretical, and one physics sequences in a random sample of 100 sequences from the OEIS.

The field I'm feeling most natural is mathematics, and I think my most successful work is associated with the OEIS database of integer sequences which sparked all my papers so far. To get an impression on what type of OEIS entries there are, I decided to work on a random sample of one hundred of them, and try to classify them.

So, let's get a random sample. Welcome to a hundred random numbers between 1 and 229000:

? for(i=1,100,print1(random(229000),","))

First, there are polynomials in n, linear recurrences with constant coefficients (or lin-recs as the editors call them frequently), and other holonomic sequences. This is basic stuff, although not completely uninteresting. Many at first really interesting sequences later turn out to be of this type: A004315, A005056, A009671, A012845, A013081, A029920, A070358, A107396, A109794, A132200, A133886, A135493, A140405, A175485, A193931, A213036
Then, the sequences involving primes. In my personal opinion most such sequences are random (no formulae possible), and you can't say much about them in terms of conjectures, although they may not be unimportant to have in the database: A003631, A007996, A013637, A022465, A045467, A066520, A086762, A088592, A090725, A100669, A105998, A118812, A120853, A122413, A142247, A188754
Sequences involving decimal, and other digits: A034967, A037914, A053974, A061958, A075009, A092995, A095827, A102120, A117860, A141063, A209859
A certain amount of OEIS entries are decimal expansions of constants. The justification to include them is the benefit for inverse calculations, and as a point where to collect statements and references about the respective constant: A088543, A153205, A154167, A196505, A196758, A198565, A201848
Some sequences are so arbitrary that, although they could be interesting, it would be better to look at a definition or formula with small constants first and generalize from that. If the submitter gives no reason for the importance of such an arbitrary sequence, it is most likely unimportant. I found the following that fit this description: A030835, A040566, A152339, A182771

Now, the rest is what many OEIS editors agree to be interesting.
The really interesting sequences can be divided according to the field of mathematics they arise in, so let me list them so grouped. From here I will give one-liner definitions and make them clickable.

Number theory
A002547 Numerator of {n-th harmonic number H(n) divided by (n+1)}.
A004618 Divisible only by primes congruent to 4 mod 5.
A033831 Number of d dividing n such that d>=3 and 1<=n/d<=d-2.
A049384 a(0)=1, a(n+1) = (n+1)^a(n).
A060553 Symmetric patterns in the cellular automaton that generates Pascal's triangle modulo 2.
A064031 Product of non-unitary divisors of n!.
A081474 Distinct lines through the origin in n-dimensional cube of side length n.
A088138 Generalized Gaussian Fibonacci integers.
A088303 Smallest integer value of n!/ ( 1!a!b!c!...) ...
A089552 Sum of legs of primitive Pythagorean triangles having legs that add up to a square, sorted on hypotenuse.
A094234 Period of terms in continued fraction expansion of 2^n*tanh(1).
A117658 Number of solutions to x^(k+1)=x^k mod n for some k>=1.
A120615 sum(k=0,n,floor(phi*floor(n/phi))) where phi=(1+sqrt(5))/2.
A139799 n>=2 such that there is an integer k>1 with k divides n and k divides (n/k)+1.
A140418 Position of cubes in the EKG sequence.
A141321 Special sum of divisors of n.
A152066 Coefficients of certain polynomials.
A160394 Numbers n = p*q*r (p, q, r prime) congruent to 0 mod p+q+r.
A172819 Number of n X 9 0..4 arrays with row sums 9 and column sums n.
A173931 Primitive numbers k such that m/k is in the Cantor set for some m.
A178272 Number of collinear point 7-tuples in an n X n .. X n 4-dimensional cubical grid.
A178535 Matrix inverse of A178534.
A185383 Denominator of the fraction |n^2/A049417(n)-A064380(n)|.
A189675 Composition of Catalan and Fibonacci numbers.
A200521 Numbers n such that omega(n)=4 but bigomega(n)>4.
A218335 Even numbers n such that the largest value in trajectory of n under the juggler map is greater than n.
A227128 The twisted Euler phi-function for the non-principal Dirichlet character mod 3.
A227434 Value of row n in Pascal's triangle mod 3 seen as ternary number.

Enumerative combinatorics
A028461 Number of perfect matchings in graph P_{3} X C_{4} X P_{n}.
A057545 Maximum cycle size in range...
A124419 Number of partitions of the set {1,2,...n} having no blocks that contain both odd and even entries.
A135493 Number of ways to toss a coin n times and not get a run of six.
A149516 Number of walks within N^3 (the first octant of Z^3) starting...
A183882 Number of arrangements of n+2 numbers in 0..7 with ...
A185334 Number of not necessarily connected 3-regular simple graphs on 2n vertices with girth at least 4.
A186764 Permutations of {1,2,...,n} having k increasing even cycles.
A207224 Number of nX4 0..2 arrays avoiding the patterns ...
A208545 Number of 7-bead necklaces of n colors allowing reversal, with no adjacent beads having the same color.
A211359 Noncrossing partitions up to rotation and reflection of an n-set that contain k singleton blocks.
A214130 Partitions of n into parts congruent to +-2, +-3 (mod 13).
A227189 (k+1)-th part of the unordered partition which has been encoded in the binary expansion of n.

Group theory
A019537 Number of special orbits for dihedral group of degree n.
A057743 Maximal order of element of alternating group A_{2n+1}.
A170263 Number of reduced words of length n in Coxeter group on 14 generators
A214464 Degrees of irreducible representations of Suzuki group Sz(32).

Mathematical physics
A008199 Coordination sequence T4 for Zeolite Code MTW.

So, now you have a pretty good overview of what kind of OEIS entries exist, and what OEIS editors think are interesting submissions. If such pearls as the above can be found in a random sample of 100, what treasures might lurk there in the whole thing? Look for yourself!

2011-09-03

Methionine

This story has three parts: Met salvage, catabolism, and urology. And it spans three decades of missing research.

L-Methionine (Met) is an essential amino acid. Its use is to take part in Met-RNA and protein biosynthesis, and the synthesis of S-Adenosylmethionine (SAM). In all cases it is recycled. Even when SAM is used to produce polyamines, the sulfur is recycled to Met via the Met salvage pathway. However, if you take a Met overdose -- say 1 or 2 grams orally -- the excess doesn't show in the blood for long, and is degraded or changed quickly. It appears to be well known[1] that this excess leads to an excess of sulfate which is excreted with urine. Around 1985, at least two reactions were hypothesized for excess Met -- transamination to 4-methylthio-2-oxobutanoate (MOB) and transmethylation-transsulfuration via SAM, homocysteine and cystathionine -- with inconclusive results on which is the main path[2]. The transamination reaction to MOB certainly plays a role[3] but where the sulfate comes from quantitatively (MOB or cystathionine) is still unclear, as well as the whole regulation issue in such a tightly regulated system. Possibly the location, cytosol or mitochondria, makes a difference. Meanwhile, a review elucidated the cysteine catabolic branch[4]. So, a complete characterization of the Met-catabolic pathway via transamination -- or the proof of it being irrelevant awaits the trophy-hungry lab rat.

Additionally, in the Met salvage pathway, we don't know exactly the human gene producing the necessary methylthioribulose 1-phosphate dehydratase activity (EC 4.2.1.109). From homology to yeast, it might be APIP but the human activity was never shown. And finally, while transamination to and from Met is proven, which of the many transaminases has that broad specificity to also take on Met? Our guess it's the GGT but noone bothered to test it for decades.

Finally, the sulfate excretion accounting for the acidification potential of Met[5], according to my urologist, this is the only compound with that effect on humans. There may be also ammonium chloride (ref?). Okay, there is the n=60 study[6] showing diluted vinegar being effective in urinary tract infection (UTI), but would you drink it daily to prevent infections? Surprisingly, although the beneficial effect of low pH urine for UTI prevention is beyond doubt, there is no clinical study using Met for this. It would be so easy, the pH test strips and Met itself are inexpensive, so please someone take up this piece of Unsexy Science!

Refs:
1. Mudd, S. H., and H. L. Levy. 1983. Disorders of Transsulfuration. In: The Metabolic Basis of Inherited Disease. 5th edition. J. B. Stanbury, J. B. Wyngaarden, D. S. Fredrickson, J. L. Goldstein, and M. S. Brown, editors. McGraw-Hill Book Co., Inc., New York. 522-559. (unchecked)
2. J. D. Finkelstein, J. J. Martin: Methionine metabolism in mammals. Adaptation to methionine excess. In: J biol chem 261, 4, 1986, 1582–1587. PMID 3080429.
3. W. A. Gahl, I. Bernardini et al.: Transsulfuration in an adult with hepatic methionine adenosyltransferase deficiency. In: J clin. invest. 81, 2, 1988, 390–397. doi:10.1172/JCI113331. PMID 3339126. PMC 329581.
4. M. H. Stipanuk, I. Ueki: Dealing with methionine/homocysteine sulfur: cysteine metabolism to taurine and inorganic sulfur. In: Journal of inherited metabolic disease 34, 1, 2011, 17–32. doi:10.1007/s10545-009-9006-9. PMID 20162368. PMC 290177. (Review)
5. D. L. Bella, M. H. Stipanuk: Effects of protein, methionine, or chloride on acid-base balance and on cysteine catabolism. In: Am J phys 269, 5 Pt 1, 1995, E910–E917. PMID 7491943.
6. Y. C. Chung, H. H. Chen, M. L. Yeh: Vinegar for Decreasing Catheter-Associated Bacteriuria in Long-Term Catheterized Patients : A Randomized Controlled Trial. In: ''Biological research for nursing'' epub 2011. doi:10.1177/1099800411412767. PMID 21708892.

Welcome!