Detecting Subtle Departures from Randomness

Entitled “Detecting Subtle Departures from Randomness”, the full version in PDF format is accessible in the “Free Books and Articles” section, here.

Figure 1 below shows two plots arising from two different, non-periodic pseudo-random sequences of +1 and -1 in about equal proportions, distributed seemingly randomly. The left one is associated to prime numbers, assumed to be distributed somewhat randomly. The right one is based on the binary digits of the square root of 2, also assumed to be distributed somewhat randomly.

Figure 1: Uncovering non-randomness

The test proposed here clearly magnifies the lack of randomness of the first sequence, while unable to find patterns in the latter. The implications are significant: prime numbers are not distributed as randomly as most people think. A consequence is that pretty much all congruential pseudo-random number generators, which rely on properties of the prime numbers in one way or another, do not produce perfectly random sequences, and may be hacked. This includes the Mersenne twister implemented in Python and other programming languages. It makes many cryptographic systems unsecure, unless they are reinforced.

I discuss, in simple English, how to detect weak deviations from randomness, and workarounds to get better random-looking and unbreakable sequences. The theoretical background is related to the famous unsolved Riemann Hypothesis in number theory. This article also offers a strong, state-of-the-art summary on this topic, accessible to machine learning practitioners or beginners, and to decision makers in the field. The topic is usually explained in obscure jargon or inane generalities. To the contrary, this article will intrigue you with the beauty and power of this theory.

Abstract

I discuss a new test of randomness for pseudo random number generators (PRNG), to detect subtle patterns in binary sequences. The test shows that congruential PRNGs, even the best ones, have flaws that can be exacerbated by the choice of the seed. This includes the Mersenne twister used in many programming languages including Python. I also show that the digits of some numbers such as the square root of 2205, conjectured to be perfectly random, fail this new test, despite the fact that they pass all the standard tests. I propose a methodology to avoid these flaws, implemented in Python. The test is particularly useful when high quality randomness is needed. This includes cryptographic and military-grade security applications, as well as synthetic data generation and simulation-intensive Markov chain Monte Carlo methods.

The origin of this test is in number theory and connected to the Riemann Hypothesis. In particular, it is based on Rademacher stochastic processes. These random multiplicative functions are a number-theoretic version of Bernoulli trials. My article features state-of-the-art research on this topic, as well as an original, simple, integer-based formula to compute square roots to generate random digits. It is offered with a Python implementation that handles integers with millions of digits.

Table of Contents

Introduction

Strong Pseudo-random numbers

  • New test of randomness for PRNG’s
  • Theoretical background: the law of the iterated logarithm
  • Connection to the Generalized Riemann Hypothesis

Testing well-known sequences

  • Reverse-engineering a pseudo-random sequence
  • Illustrations

Python code

  • Fixes to the faulty random function in Python
  • Prime test implementation to detect subtle flaws in PRNG’s
  • Special formula to compute 10 million digits of square root of 2

References

Download the Article

The technical article, entitled Detecting Subtle Departures from Randomness, is accessible in the “Free Books and Articles” section, here. The text highlighted in orange in this PDF document are keywords that will be incorporated in the index, when I aggregate all my related articles into a single book about innovative machine learning techniques. The text highlighted in blue corresponds to external clickable links, mostly references. And red is used for internal links, pointing to a section, bibliography entry, equation, and so on.

To not miss future articles, sign-up to our newsletter, here.

About the Author

Vincent Granville is a pioneering data scientist and machine learning expert, co-founder of Data Science Central (acquired by  TechTarget in 2020), former VC-funded executive, author and patent owner. Vincent’s past corporate experience includes Visa, Wells Fargo, eBay, NBC, Microsoft, CNET, InfoSpace. Vincent is also a former post-doc at Cambridge University, and the National Institute of Statistical Sciences (NISS).  

Vincent published in Journal of Number TheoryJournal of the Royal Statistical Society (Series B), and IEEE Transactions on Pattern Analysis and Machine Intelligence. He is also the author of multiple books, available here. He lives  in Washington state, and enjoys doing research on stochastic processes, dynamical systems, experimental math and probabilistic number theory.

%d bloggers like this: