A Java Programmer’s Guide to Random Numbers, Part 1: Beyond java.util.Random

Posted in Evolutionary Computation, Java by Dan on April 3rd, 2008

More than you ever wanted to know about randomness in Java programs.

This is the first in a series of articles about random numbers in Java programs. The series covers the following topics and also serves as an introduction to the random numbers package of the Uncommons Maths library:

  • True Random Numbers and Pseudorandom Numbers
  • Statistical Quality (making sure the dice are not loaded)
  • Performance (because sometimes you really do need half a million random numbers a second)
  • Different kinds of randomness (because not everything can be modelled by tossing a coin)
  • Degrees of Freedom (is every possible outcome actually possible?)
  • Security and Integrity (when not losing money depends on nobody knowing what happens next)

Part 1: Beyond java.util.Random

Random numbers are useful in a wide variety of software applications. They provide a crucial element of uncertainty in an otherwise deterministic world. Without random numbers in computers, the hugely popular online poker industry would not exist, video games would be boring and predictable, iTunes would have no shuffle, cryptography would be much more difficult, and many innovative algorithms, such as those used in artificial intelligence and evolutionary computation, simply couldn’t work.

True Random Numbers and Pseudorandom Numbers

“Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin.” – John von Neumann

Before continuing, it is important to make the distinction between so-called “true” random numbers and pseudorandom numbers. Though it may often seem otherwise to us as programmers, computer systems are infuriatingly deterministic. They are generally incapable of doing things randomly. To get a computer to behave in a way that is truly random, it is necessary to introduce some non-deterministic input. We could have somebody constantly rolling dice or flipping coins and typing in the results, but that is not very practical. A slightly more feasible approach is to construct a device that observes some real world phenomenon that is known to be unpredictable, such as radioactive decay or atmospheric noise. Data extracted from these events can be used a source of entropy for our applications. You could purchase a device that plugs into a serial port or USB port. To access these devices from Java you’d probably have to use C/C++ and JNI. Alternatively, you could get true random numbers indirectly – from an online service such as Random.org or Hotbits.

Since we can get truly unpredictable random numbers from this kind of hardware, why don’t we satisfy all of our randomness requirements in this way? Well the primary problem is throughput. These devices are quite limited in the quantity of randomness they can produce. They simply aren’t fast enough for many uses.

Pseudorandom numbers are not really random at all. They are the result of deterministic mathematical formulae. The best of these algorithms have been devised so that their output is statistically indistinguishable from true random numbers. PRNGs start with a single numeric seed value. The algorithm is applied to this seed to generate the output and a new, updated seed that is used to generate the next value. The mathematics involved is beyond the scope of this article – the definitive guide is the second volume of Donald Knuth’s The Art of Computer Programming.

An interesting property of this approach is that if you always start with the same seed value, you will always get the same sequence of “random” numbers. Though this can occasionally be useful, you would normally strive to avoid using the same seed value in the interests of unpredictability. A simple approach that is sufficient in many cases is to seed the PRNG from the current value of the system clock.

Aside from speed, another advantage of pseudorandom number generators (PRNGs) over true random number generators (TRNGs) is that they are more predictably unpredictable. That is, the statistical properties of PRNGs are more reliable than is the case with TRNGs.

Statistical Quality

xkcd.com

Why not java.util.Random?

The Java API helpfully provides a PRNG that we can use, the java.util.Random class. If you are a Java programmer you will almost certainly have used it at some point. For non-critical random numbers, e.g. adding some unpredictability to a game, it’s fine. It’s even pretty quick. Unfortunately, it’s not random enough – not by the standards required for more serious applications.

The problem of deciding whether a sequence of numbers is random is not an easy one to solve. You can’t simply look at the output and decide that it doesn’t look random enough. After all, if you toss a coin ten times, it could randomly come up heads every time, even though the probability of this sequence is pretty small. To get any kind of meaningful evaluation requires a large sample of the RNG output (perhaps millions of generated values). This sample can then be subjected to sophisticated statistical analysis.

Probably the best known test suite for random number generators is George Marsaglia’s Diehard Battery of Tests of Randomness. Diehard says that java.util.Random is not sufficiently random. But you don’t have to interpret Diehard’s complicated reports to see for yourself. This applet demonstrates it visibly.

Performance

So if not java.util.Random, how about Java’s other RNG, java.security.SecureRandom? SecureRandom is built for cryptography so it is specifically designed to avoid any such flaws. Diehard reports no issues with its output. Unfortunately, this quality comes at a high price – performance. In benchmarks, SecureRandom can be up to 60 times slower at generating random numbers than java.util.Random. This is bearable if you are only generating random values infrequently, but if your program relies on generating random numbers non-stop and as fast as possible (as many simulations do) then it’s a show-stopping bottleneck.

The good news is that, beyond the core API, there are random number generators as fast as (or faster than) java.util.Random with statistical properties as good as SecureRandom. The Uncommons Maths project provides a comprehensive open source random numbers package. Uncommons Maths provides three Random Number Generators, with different properties, for a wide variety of applications. Unlike java.util.Random, each of these RNGs completes the Diehard suite without any problems. Additionally, each of the Uncommons Maths RNGs is a sub-class of java.util.Random, which means any of them can be used as a drop-in replacement with minimal effort.

A good general-purpose PRNG is the MersenneTwisterRNG class. It is a pure Java port of Makoto Matsumoto and Takuji Nishimura’s proven and ultra-fast Mersenne Twister PRNG for C. Even faster is CellularAutomatonRNG, the Java port of Tony Pasqualoni’s experimental cellular automaton PRNG.

In the next article: Beyond dice and coins – using the right probability distribution.

Getters, Setters and the Great Coverage Conspiracy

Posted in Java, Software Development by Dan on April 1st, 2008

A frequent topic of Java-related blogs is whether it is worthwhile to write unit tests for simple getters and setters. This posting that I came across today proposes a reflection-based trick for eliminating much of the work in writing these tests. Maybe this is an improvement over other approaches, but what bothers me most is the motivation for wanting to test getters and setters in the first place.

It seems that many of those advocating unit testing simple getters and setters are driven by a desire to improve their coverage scores with the actual utility of the tests a secondary concern.

Firstly, I should state that I am absolutely in favour of measuring coverage for test suites. In fact, I think it’s pretty much essential. If you are writing automated tests but not measuring code coverage then you are just scratching around in the dark. What’s great about coverage reports, particularly those that show branch coverage as well as line coverage, is that you get to see exactly where your tests are neglecting certain scenarios. Coverage reports can also be useful in highlighting code that is not used and can be removed.

The problem with code coverage is that it only shows where your tests are weak. It does not prove that your tests are good, even if the coverage is 100%. So writing tests with the sole aim of improving the coverage score is merely an exercise in self-deception. It’s the tail wagging the dog.

If you need to add tests for all your getters and setters in order to achieve x% code coverage, where x is some mandated target, there are two questions you need to ask:

  1. Do you have too many getters and setters?
  2. Are you avoiding testing difficult code?

I could go on for pages about the first point. There are far too many getters and setters in most Java code. Too many developers think encapsulation is simply a case of making fields private and providing access to them with getters and setters. It would be better to aim for making fields private and not providing access to them with getters and setters. Favouring constructor-based dependency injection over setter-based DI is something else to consider (although that’s a whole other article in the making…).

How do you know if you have too many getters and setters? Well your coverage reports are a good starting point. If the getters and setters are essential to your application, it will be just about impossible to avoid exercising them indirectly from other tests. If you have good coverage elsewhere but the getters and setters aren’t touched, chances are they aren’t needed. Adding more tests is not the only way of improving your test coverage. Another way is to remove code so that you have less to test.

The second question above is also important. If you require your team to achieve a rigid 75% test coverage target then you are almost guaranteeing that you will get tests for the 75% of the application that is easiest to test. Writing tests for getters and setters helps to fulfil the 75% requirement without needing to think about how to test the difficult bits of the system. Unfortunately, the other 25% is probably the code that really needs testing/refactoring.

For me it’s pretty clear. Don’t write unit tests for getters and setters. Better still, don’t write getters and setters (except where necessary). And don’t confuse test-driven development with coverage-driven development.

Improved HTML reports for TestNG – ReportNG 0.9.5

Posted in Java by Dan on March 24th, 2008

ReportNG 0.9.5 is now available for download. ReportNG provides alternative HTML and XML reporting plugins for the TestNG test framework.

This release addresses all of the most important outstanding issues in the issue tracker. This includes proper handling of tests that use TestNG’s DataProvider feature, better escaping of special characters in XML and HTML output, and information about the test groups relevant to each test suite.

Thanks to everybody who contributed to this release by providing feedback and suggestions for the previous releases. Any new bug reports and enhancement requests can be entered into the issue tracker.

ReportNG 0.9.3 – XHTML and XML Reporting for TestNG

Posted in Java by Dan on March 5th, 2008

The latest version of ReportNG (version 0.9.3), a reporting plug-in for TestNG is available for download.  ReportNG generates XHTML reports and/or JUnit-format XML reports (for importing TestNG results into tools that recognise the JUnit format).

This is a bug-fix release that addresses a couple of problems with escaping of strings in the JUnit XML reporter.

ReportNG Updated – Version 0.9.2

Posted in Java by Dan on February 3rd, 2008

ReportNG, the reporting plug-in for TestNG, is now up to version 0.9.2.  The JUnit XML reporter introduced previously is now complete and there are also a few minor enhancements to the HTML report.

Maven Revisited: Fallacies and usefulness

Posted in Java, Software Development by Dan on February 3rd, 2008

The eternal love/hate debate around Maven has resurfaced again recently, triggered by Howard Lewis Ship’s denouncement of Maven following his experiences on the Tapestry project (InfoQ has a summary of the recent discussions). I’ve touched on this previously with a brief echoing of Charles Miller’s sentiments.

Meanwhile, Don Brown recently announced his one-man crusade to inject some much-needed common sense into how Maven works (just addressing the first 4 items on his list would make things much better). Don followed this up by applying yet more common sense, this time to how Maven works with plugins.

All of this activity and debate prompted me to reconsider exactly what it is about Maven that makes me avoid it wherever possible. It’s certainly not the convention over configuration. Maven’s conventions might not be to everyone’s taste, but the reasoning is sound (I employ a similar convention-over-configuration approach to Ant builds, using macros). The lack of documentation is definitely a contributing factor, and it’s certainly buggier than Ant. But in the end it all comes down to the dependency management and Charles Miller’s inescapable conclusion that it is “broken by design”.

Maven’s fondness for project dependencies scattered across continents fails to take into account many of the 8 Fallacies of Distributed Computing. Even ignoring the trust issues stemming from the outsourcing of your dependency-management, the more dependencies you have, the less likely you are to be able to access all of them at any given point in time. There will be failures beyond your control at some point.

Assuming that all of the repositories are available, you’ve then got to hope that somebody doesn’t upgrade a library that you depend on and introduce a bug into your application. If you haven’t taken the precautions to specify an exact version number, you could be in trouble. And if you have taken the precautions, why persist with this fragile nonsense? Why not just put the JAR files in the version control repository and remove most of the potential failure points? The same question occurred to Don Brown and he’s doing something about it (it’s point number 4 on his list).

So far, so negative. Are there any redeeming features of Maven? Well there is one use case where the dependency management actually makes a fair bit of sense: internal dependencies. By internal dependencies, I mean dependencies within an organisation. If your development team has multiple projects and there are dependencies between them, the Maven approach could be the solution. In this scenario, everything is within your control. The software is written by your developers and hosted on your servers that are managed by your admins.

This dynamic approach to dependency management is more flexible than copying and pasting different versions of source and/or binaries between projects. And if you limit it to internal projects, you can eliminate most of the downsides. Of course, you don’t have to use Maven for this. You may want to look at Ivy instead.

ReportNG 0.9.0 – HTML and XML Reports for TestNG

Posted in Java by Dan on January 24th, 2008

I’ve just released version 0.9.0 of ReportNG, my reporting plug-in for TestNG. This version includes several improvements over the previous release. Most importantly, it addresses the biggest problem with the last version – the lack of information about configuration failures. Previously you had to increase the TestNG verbosity and dig around in the console output. Now config failures are displayed prominently above the test results.

There is also more information about skipped tests now too. Dependencies on groups and methods are listed for tests that have been skipped so that you can figure out why they were skipped.

Finally, aside from a few other cosmetic improvements to the HTML, I have introduce an experimental JUnit XML reporter. I had some problems getting the output that I wanted from the one bundled with TestNG. Mine generates one XML file per test class, which is closer to what you get from JUnit. Now you have the option of using whichever is most suitable for you (the default TestNG reporter or this one). The JUnit XML reporter should be considered incomplete (it doesn’t include detailed test case information yet), but it is useful enough for Hudson to generate summary reports.

If you have any feedback, either post a comment below or use the issue tracker.

JavaNG: A Better Java – What would you do?

Posted in Java by Dan on January 4th, 2008

Forget the shackles of backwards-compatibility. If you could make changes to Java to make it a better language, what would you do? I don’t mean drastic surgery so that you end up with Lisp on the JVM, but what could be done to make Java a better platform-independent, object-oriented language?

Bruce Eckel has some interesting thoughts about the evolution of Java and Sun’s commitment to backwards-compatibility. I have to agree that backwards compatibility will eventually kill Java if new features are continuously compromised by nasty hacks (e.g. erasure) to avoid breaking legacy code. At some point, the guardians of Java must choose new features or backwards-compatibility. To continue to demand both is not sustainable.

With this in mind, I’ve made a list of things that I personally would like to improve in Java, assuming that backwards-compatibility is not a primary concern. I’m interested to hear others’ suggestions, so leave a comment at the end if you have better ideas.

If it weren’t for Sun’s marketing department, we could have called it Java 2. Java++ is just wrong, so I’ll follow Cedric’s naming scheme and call it JavaNG – the next generation OO language for the JVM.

Maybe it’s just me, but this really bugs me…

Firstly, and it may seem trivial to most, but I would scratch an itch that has been bothering me for over a decade. The Cloneable interface does not define the clone() method. An object must define a public clone method in order to be cloneable, but the Cloneable interface does not require this. Therefore it is perfectly legal to implement Cloneable yet not be cloneable. Of course, any class that does this is obviously broken, yet the Cloneable interface apparently cannot be changed since such a class would no longer compile. In my opinion, Sun would be doing developers a favour by making the compiler complain about this. FFS Sun, you broke everybody’s code by adding the assert keyword in 1.4, why would fixing Cloneable be such a big deal?

Is this an object-oriented language or not?

Right, now on to more substantial concerns. Primitives have to go. Everything should be an object. It would simplify learning the language and primitives don’t mix with generics without auto-boxing (and auto-boxing is horrific). No primitives – no auto-boxing – no problem.

I believe that primitives were originally included in the language because of performance concerns about doing everything with objects. Well virtual machines and hardware have both moved on significantly since the mid-90s. And if Ruby seriously is the main competition, then performance clearly doesn’t matter anyway.

BigDecimal.ONE.add(BigDecimal.ONE).compareTo(new BigDecimal("2")) == 0

Obviously if there are no primitives to unbox to, then the arithmetic operators have to be defined to work on the appropriate object types (java.lang.Integer, java.lang.Double, etc.). But why stop there? Arithmetic operators for BigInteger and BigDecimal are long overdue. Without first-class support for arbitrary-precision arithmetic, Java is not a sensible choice for many types of application. I mean, how hard should it be to assert that 1 + 1 = 2?

Types are for life, not just for compilation

I want generics without erasure. Since I can accept breaking backwards compatibility where necessary, I don’t have to accept a weaselly implementation compromise and the resulting frustrations of not being able to inspect type information at runtime.

Just say no to null

For a language that supposedly doesn’t have pointers, there are an awful lot of NullPointerExceptions in Java programs. The simple truth is that developers have proven to be incapable of keeping track of which references are allowed to be null. I want the language to do this for me. More specifically, I want the type system to do this for me. The Nice programming language has the concept of option types. I want one of those. If my method declares that its parameter cannot be null, then any attempt to pass it null (or even to pass it a value of a nullable type) should result in a compiler error. Non-nullable types will become the new ‘final fields’. They’ll be used everywhere unless there is a good reason not to.

Literal bias

Java’s verbose they say. Well Java will always look a bit wordy when compared to Python with its sleek dictionary syntax. But we could have that too. Literals for maps and lists would help to make code more concise and more readable.

Tuples

According to Gilad Bracha, Java was supposed to have had tuples from the start, they just never made it into the language. It’s time to rectify that. No more writing several trivial little classes (or generic Pair and Triple classes) just to wrap together multiple return values.

The stuff that didn’t make it…

I think that list is long enough for now (though I’m sure I’ll think of more later). You’ll notice that I haven’t taken a position on closures. I’m unconvinced and undecided. I need to read the proposals properly. In the past I’ve advocated first-class functions for Java, but for now I’m deferring on that too, at least until I’ve picked a side in the closures debate. I also think that the original Java designers got it right when they left multiple inheritance and operator-overloading in C++ land.

If you think you have a better plan for JavaNG, please add a comment. I’ll post a follow-up in a few days summarising the suggestions.

Developing with SoyLatte – First Impressions

Posted in Java, Mac by Dan on December 29th, 2007

Apple recently released an updated preview of Java 6, but only for 64-bit Intel Macs running Leopard. With the company remaining stubbornly silent on its future Java plans, nobody knows whether 32-bit Macs or the Tiger operating system will ever be supported.

For those developers who don’t want to be forced to upgrade to Leopard or to discard their perfectly adequate CoreDuo Macs, there is an alternative to playing Apple’s waiting game thanks to the efforts of Landon Fuller and his SoyLatte project.

Getting Started

SoyLatte is an X11-based port of the FreeBSD patchset for Java 6. This gives you a non-Apple JDK solution right now and on your terms (no hardware or software upgrade required). The only thing that SoyLatte lacks is Apple’s polished native look-and-feel for Swing and AWT applications (there is no seamless integration with the OS X desktop). This is because it uses X11 for display purposes (in the same way that the Mac port of OpenOffice works). If you are doing non-GUI development none of this will matter to you.

Download and installation of SoyLatte is straightforward. It’s just an archive, containing the pre-built JDK binaries, that can be dropped anywhere you please (mine is under /usr/local). Change your environment variables (JAVA_HOME, PATH, etc.) as appropriate and you are ready to go for command-line development.

Developing

My reason for requiring Java 6 was to build and hack Hudson. Hudson is Kohsuke Kawaguchi‘s open source, extensible continuous integration server. Hudson will run on Java 5 but it requires Java 6 for development. To make Mac-based development on Hudson feasible I would have to get SoyLatte to play nicely with IntelliJ IDEA and Maven.

Integrating SoyLatte with IDEA was a piece of cake. It’s just a matter of adding a new JDK in the settings dialog (just point it at your installation directory). I did have to go back and manually add the SoyLatte JRE libs to the JDK classpath as these were not picked up automatically (resulting in the IDE not being able to find classes like String and ArrayList), but once that was done everything worked perfectly. Crucially, even though I was using Java 6 classes and tools, I was still running the actual IDE using Apple’s Java 5, so I didn’t have to use X11.

Running Maven from the command line presents no new problems. It will just use whichever JDK JAVA_HOME points to. Unfortunately the IDEA Maven plugin uses whatever the IDE is using and doesn’t allow this to be changed. This is in contrast to IDEA’s Ant support, which allows the JDK to be specified explicitly. The inconsistency is probably because the Ant plugin was written by JetBrains whereas the Maven plugin is a community contribution. Running IDEA itself under SoyLatte would address this issue but, for me at least, using Maven from a terminal is preferable to running IDEA under X11.

On Maven

Posted in Java by Dan on December 20th, 2007

If I ever wanted to write down the issues I have with Maven, I couldn’t do a better job than Charles Miller’s post. Maven solves problems that I didn’t have by introducing me to a whole set of new problems that I also didn’t have.

Ant is certainly not perfect (though you can learn to use it effectively) and I remain convinced that a better solution to Java build problems can be found. However, if Maven is the answer, then we are asking the wrong question.

« Older Posts