
Studying the longest head runs in coin tossing has a very long history, starting in gaming and probability theory. Today, it has applications in cryptography and insurance. For random sequences or Bernoulli trials, the associated statistical properties and distributions have been studied in detail, even when the proportions of zero and one are different. Yet, I could not find any discussion on deterministic sequences, such as the digits or irrational numbers. The case study investigated here fills this gap, focusing on one of the deepest and most challenging problems in number theory: almost all the questions about the distribution of these digits, even the most basic ones such as the proportions of zero and one, are still unsolved conjectures to this day.
In this context, a run is a sequence of successive, identical digits. In random sequences of bits, runs have a specific probability distribution. In particular, the maximum length of a run in a random sequence of n binary digits has expectation log2 n (the logarithm of n in base 2). This fact can be used to test if a sequence violates the laws of randomness. Pseudo-random number generators that do not pass this test are not secure. The focus here is on sequences of binary digits of numbers such as SQRT(2). If working in base 10 rather than binary digits, replace log2 by the decimal logarithm.
In this short technical note, I feature a new result (with proof and empirical verification) about the largest run of zeros that you can potentially have, starting at position n in the binary expansion of these numbers. Empirical evidence strongly suggests that as n increases (thus, asymptotically), the upper bound is indeed log2 n. If you can prove or disprove this conjecture, I will offer you a $1 million award, to match the offer by the Clay Institute to solve the Riemann Hypothesis. More on this in a future article. This is a serious offer: contact me if you want to know the details.
Of course, what I was able to prove is a much weaker result, but still spectacular. I spent a considerable amount of time trying to improve it, to non-avail. Based on my experience, I know that this is an extremely challenging problem, harder than the Riemann Hypothesis. Do not try to solve it, there are many much easier ways to make $1 million. Nevertheless, I will pay you if you do. To see my weaker result and the Python code to efficiently compute billions of the binary digits in question, download this 5-page paper here on GitHub (no cost, no sign-up required). It features scientific computing at its finest, with the Gmpy2 library. In the table below, Ln represents the length of a run of zeros starting at position n, featuring the successive records only (longest runs).
To not miss future articles, sign-up (for free) to my newsletter, here.
Author
Vincent Granville is a pioneering GenAI scientist and machine learning expert, co-founder of Data Science Central (acquired by a publicly traded company in 2020), Chief AI Scientist at MLTechniques.com, former VC-funded executive, author and patent owner — one related to LLM. Vincent’s past corporate experience includes Visa, Wells Fargo, eBay, NBC, Microsoft, and CNET.
Vincent is also a former post-doc at Cambridge University, and the National Institute of Statistical Sciences (NISS). He published in Journal of Number Theory, Journal of the Royal Statistical Society (Series B), and IEEE Transactions on Pattern Analysis and Machine Intelligence. He is the author of multiple books, including “Synthetic Data and Generative AI” (Elsevier, 2024). Vincent lives in Washington state, and enjoys doing research on stochastic processes, dynamical systems, experimental math and probabilistic number theory. He recently launched a GenAI certification program, offering state-of-the-art, enterprise grade projects to participants.
To prove this conjecture, we can use a proof by contradiction. Assume that the longest run of zeros that can potentially occur in the binary expansion of SQRT(2) starting at position n is greater than log2 n. This means that there exists a binary sequence of length n+log2 n that starts with a run of log2 n or more zeros, followed by a nonzero digit. We can then construct a new binary sequence by replacing the first log2 n zeros with log2 n ones. This new sequence will also represent the number SQRT(2), but it will have a shorter run of zeros at the beginning. This contradicts the assumption that the original sequence had the longest possible run of zeros at the beginning.