A Java Programmer’s Guide to Random Numbers, Part 1: Beyond java.util.Random

Posted in Evolutionary Computation, Java by Dan on April 3rd, 2008

More than you ever wanted to know about randomness in Java programs.

This is the first in a series of articles about random numbers in Java programs. The series covers the following topics and also serves as an introduction to the random numbers package of the Uncommons Maths library:

  • True Random Numbers and Pseudorandom Numbers
  • Statistical Quality (making sure the dice are not loaded)
  • Performance (because sometimes you really do need half a million random numbers a second)
  • Different kinds of randomness (because not everything can be modelled by tossing a coin)
  • Degrees of Freedom (is every possible outcome actually possible?)
  • Security and Integrity (when not losing money depends on nobody knowing what happens next)

Part 1: Beyond java.util.Random

Random numbers are useful in a wide variety of software applications. They provide a crucial element of uncertainty in an otherwise deterministic world. Without random numbers in computers, the hugely popular online poker industry would not exist, video games would be boring and predictable, iTunes would have no shuffle, cryptography would be much more difficult, and many innovative algorithms, such as those used in artificial intelligence and evolutionary computation, simply couldn’t work.

True Random Numbers and Pseudorandom Numbers

“Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin.” – John von Neumann

Before continuing, it is important to make the distinction between so-called “true” random numbers and pseudorandom numbers. Though it may often seem otherwise to us as programmers, computer systems are infuriatingly deterministic. They are generally incapable of doing things randomly. To get a computer to behave in a way that is truly random, it is necessary to introduce some non-deterministic input. We could have somebody constantly rolling dice or flipping coins and typing in the results, but that is not very practical. A slightly more feasible approach is to construct a device that observes some real world phenomenon that is known to be unpredictable, such as radioactive decay or atmospheric noise. Data extracted from these events can be used a source of entropy for our applications. You could purchase a device that plugs into a serial port or USB port. To access these devices from Java you’d probably have to use C/C++ and JNI. Alternatively, you could get true random numbers indirectly – from an online service such as Random.org or Hotbits.

Since we can get truly unpredictable random numbers from this kind of hardware, why don’t we satisfy all of our randomness requirements in this way? Well the primary problem is throughput. These devices are quite limited in the quantity of randomness they can produce. They simply aren’t fast enough for many uses.

Pseudorandom numbers are not really random at all. They are the result of deterministic mathematical formulae. The best of these algorithms have been devised so that their output is statistically indistinguishable from true random numbers. PRNGs start with a single numeric seed value. The algorithm is applied to this seed to generate the output and a new, updated seed that is used to generate the next value. The mathematics involved is beyond the scope of this article – the definitive guide is the second volume of Donald Knuth’s The Art of Computer Programming.

An interesting property of this approach is that if you always start with the same seed value, you will always get the same sequence of “random” numbers. Though this can occasionally be useful, you would normally strive to avoid using the same seed value in the interests of unpredictability. A simple approach that is sufficient in many cases is to seed the PRNG from the current value of the system clock.

Aside from speed, another advantage of pseudorandom number generators (PRNGs) over true random number generators (TRNGs) is that they are more predictably unpredictable. That is, the statistical properties of PRNGs are more reliable than is the case with TRNGs.

Statistical Quality

xkcd.com

Why not java.util.Random?

The Java API helpfully provides a PRNG that we can use, the java.util.Random class. If you are a Java programmer you will almost certainly have used it at some point. For non-critical random numbers, e.g. adding some unpredictability to a game, it’s fine. It’s even pretty quick. Unfortunately, it’s not random enough – not by the standards required for more serious applications.

The problem of deciding whether a sequence of numbers is random is not an easy one to solve. You can’t simply look at the output and decide that it doesn’t look random enough. After all, if you toss a coin ten times, it could randomly come up heads every time, even though the probability of this sequence is pretty small. To get any kind of meaningful evaluation requires a large sample of the RNG output (perhaps millions of generated values). This sample can then be subjected to sophisticated statistical analysis.

Probably the best known test suite for random number generators is George Marsaglia’s Diehard Battery of Tests of Randomness. Diehard says that java.util.Random is not sufficiently random. But you don’t have to interpret Diehard’s complicated reports to see for yourself. This applet demonstrates it visibly.

Performance

So if not java.util.Random, how about Java’s other RNG, java.security.SecureRandom? SecureRandom is built for cryptography so it is specifically designed to avoid any such flaws. Diehard reports no issues with its output. Unfortunately, this quality comes at a high price – performance. In benchmarks, SecureRandom can be up to 60 times slower at generating random numbers than java.util.Random. This is bearable if you are only generating random values infrequently, but if your program relies on generating random numbers non-stop and as fast as possible (as many simulations do) then it’s a show-stopping bottleneck.

The good news is that, beyond the core API, there are random number generators as fast as (or faster than) java.util.Random with statistical properties as good as SecureRandom. The Uncommons Maths project provides a comprehensive open source random numbers package. Uncommons Maths provides three Random Number Generators, with different properties, for a wide variety of applications. Unlike java.util.Random, each of these RNGs completes the Diehard suite without any problems. Additionally, each of the Uncommons Maths RNGs is a sub-class of java.util.Random, which means any of them can be used as a drop-in replacement with minimal effort.

A good general-purpose PRNG is the MersenneTwisterRNG class. It is a pure Java port of Makoto Matsumoto and Takuji Nishimura’s proven and ultra-fast Mersenne Twister PRNG for C. Even faster is CellularAutomatonRNG, the Java port of Tony Pasqualoni’s experimental cellular automaton PRNG.

In the next article: Beyond dice and coins – using the right probability distribution.