More than you ever wanted to know about randomness in Java programs.
This is the second in a series of articles about random numbers in Java programs. The series covers the following topics and also serves as an introduction to the random numbers package of the Uncommons Maths library:
- True Random Numbers and Pseudorandom Numbers
- Statistical Quality (making sure the dice are not loaded)
- Performance (because sometimes you really do need half a million random numbers a second)
- Different kinds of randomness (because not everything can be modelled by tossing a coin)
- Degrees of Freedom (is every possible outcome actually possible?)
- Security and Integrity (when not losing money depends on nobody knowing what happens next)
Part 2: Not just coins and dice
Part 1 of this series discussed different kinds of random number generators (RNGs), highlighted the issues with using the default Java RNGs (java.util.Random
and java.security.SecureRandom
) and introduced the Uncommons Maths library, which provides three fast, high-quality alternative Java RNGs.
But there is more to the random numbers package than just replacement RNGs. Uncommons Maths also provides convenient classes for working with several different probability distributions, which is pretty much essential for modelling anything other than the most simple random processes.
The Uncommons Maths random numbers demo application (WebStart required) demonstrates many of the concepts discussed in this article. It also serves as an example of the performance differences between different RNGs as discussed in the previous article.
Different Kinds of Randomness
Uniform Distribution
The Uniform Distribution is what you get by default from java.util.Random
(and its sub-classes – including the Uncommons Maths RNGs). The discrete uniform distribution says that there are a certain number of possible outcomes and each is equally likely. The obvious example is rolling a dice. You can get a 1, 2, 3, 4, 5 or 6 and no single outcome is more likely than any of the others. If you record the results of 6 million dice rolls you would expect to get pretty close to a million of each number.
The uniform distribution is useful for some problems, such as shuffling a deck of cards, but it is not a good choice for many other scenarios that don’t involve a set of equally likely outcomes.
Normal (Gaussian) Distribution
The Normal Distribution (also known as the Gaussian Distribution) is the familiar bell curve. It is a good model for many real world phenomena. In a normal distribution, the average outcome is more likely than any other outcome. A slightly above or below average outcome is almost as likely, and extreme outcomes, while still possible, are very unlikely.
An example of this would be the distribution of IQ test scores. Most people would achieve somewhere close to an average score (maybe slightly above or below), but a few people would achieve very low scores and a similarly small number would achieve very high scores.
The nextGaussian()
method of the java.util.Random
class provides rudimentary support for obtaining normally-distributed values. Uncommons Maths’ GaussianGenerator
class builds on this, making it easy to create distributions with different means and standard deviations. The standard deviation parameter controls how spread out the values are. A low standard deviation means most values will be very close to the mean (on either side). A high standard deviation increases the likelihood that a value will be a long way from the mean.
GaussianGenerator
is a wrapper around a standard RNG (you can use any of the Uncommons Maths RNGs, or even one of the standard Java ones if you so choose). Once you have created a GaussianGenerator
by specifying a mean, standard deviation and source RNG, you can call the nextValue()
method ad infinitum:
Random rng = new MersenneTwisterRNG(); GaussianGenerator gen = new GaussianGenerator(100, 15, rng); while (true) { System.out.println(gen.nextValue()); } |
If we were to generate thousands of simulated IQ test scores with this code, we would see that the vast majority of scores would be close to the average (100). Around 68% of the values would be within one standard deviation (15) of the mean (i.e. in the range 85 – 115), and approximately 95% would be within two standard deviations (between 70 and 130). These percentages are a well-known property of normal distributions.
The demo shows how the distribution of generated values compares to a theoretical normal distribution (the more values you generate, the closer the fit will be).
Binomial Distribution
The probability of a coin flip coming up heads is 0.5. This is pretty simple – we can just use the uniform distribution to simulate a single coin flip. But what is the probability of getting exactly 7 heads from 10 coin flips? This is where the Binomial Distribution comes in. It provides probabilities for the outcome of a series of trials, where each trial has two possible results – success or failure. The binomial distribution tells us that the probability of obtaining heads 7 times from 10 fair coin flips is about 0.12.
Suppose, for example, we wanted to randomly generate the number of times the number 6 occurred in 100 dice rolls. We could simulate the full series of trials using a uniform RNG, but this is cumbersome:
Random rng = new MersenneTwisterRNG(); int sixes = 0; for (int i = 0; i < 100; i++) { int diceRoll = rng.nextInt(6) + 1; if (diceRoll == 6)) { ++sixes; } } System.out.println("No. of 6s from 100 rolls is " + sixes); |
It is much easier, and more efficient, to use the binomial distribution. If we consider an occurrence of a 6 to be a “success”, then the probability of success is 1/6 (or ~0.1667). If we plug this value and the number of trials (100) into the Uncommons Maths BinomialGenerator
class, we can get our answer from a single method call:
Random rng = new MersenneTwisterRNG(); BinomialGenerator gen = new BinomialGenerator(100, 1d/6, rng); int sixes = gen.nextValue(); System.out.println("No. of 6s from 100 rolls is " + sixes); |
Poisson Distribution
Suppose a telephone call centre between 2pm and 4pm in the afternoon receives calls at average of three per minute. What is the probability that we receive exactly five calls in the next minute? This is the kind of question that we can answer with the Poisson Distribution. The Poisson distribution is not dissimilar to the Binomial distribution, as you will see if you plot them. The difference is that the Poisson distribution is concerned with how many independent events occur within a set interval. Whereas values from a binomial distribution have an upper bound determined by the number of trials, there is no upper bound for a Poisson distribution. The probability of five calls in a minute when the average is three is around 0.17 according to the Poisson distribution.
In simulation applications we can use the PoissonGenerator
class (its single parameter is the mean) to determine how many times something happens within a given interval.
Exponential Distribution
The Exponential Distribution is related to the Poisson Distribution. Rather than modelling how many events occur in a fixed period of time, it models the period of time between two independent events that occur at a given average rate.
Suppose you wanted to simulate some random event that happened on average 10 times a minute – perhaps you are simulating load for a server. You could simply generate events at the constant rate of exactly one every 6 seconds. But the real world is rarely this metronomic. A more realistic approach would be to generate events randomly, using an exponential distribution (provided by the ExponentialGenerator
class) to determine the intervals between events:
final long oneMinute = 60000; Random rng = new MersenneTwisterRNG(); // Generate events at an average rate of 10 per minute. ExponentialGenerator gen = new ExponentialGenerator(10, rng); boolean running = true; while (true) { long interval = Math.round(gen.nextValue() * oneMinute); Thread.sleep(interval); // Fire event here. } |
In a future article: Shuffling the deck – degrees of freedom, seeding and security concerns.