Software That Won’t Die: FSA goes Open Source

Posted in Java by Dan on June 18th, 2008

Having previously declared it deader than a Norwegian Blue, I’ve decided to resurrect my earliest, and unquestionably most obsolete, public Java project. The Football Statistics Applet (FSA) has been providing client-side league tables and statistics to numerous football websites for 8 or 9 years now.

Applets? Java 1.1? AWT? So last century. But despite that, it continues to be undoubtedly the most popular piece of software that I’ve written, even though I might have preferred one of my other projects to have claimed that tepid distinction.

A server-side solution (probably PHP and MySQL) to the same problem seems much more sensible, but it was born of an age (1999/2000) when even static web hosting accounts were much more expensive than those with server-side processing and databases are today.

So, other than a developer who is not able to let go of one of his creations, why not let it die?

Firstly, applets might just be about to make something of a come-back (possibly? …no? OK then). Sun is finally addressing some of the issues that have made them unpopular and is positioning Java as a viable contender in the RIA space. But even if this new momentum comes to nothing, the core of FSA could be useful in other applications. Maybe a servlet-based web-app, or it could be converted to a WebStart application.

Secondly, I receive enough mail about FSA to convince me that there are still people interested in seeing it revived. There are a number of minor enhancements and major new features that have been requested over the years.

Previously I avoided open-sourcing FSA because I didn’t feel that the source code represented the best that I was capable of. I was insecure. I didn’t want to be judged as a shoddy programmer. Now I don’t care what you think so FSA is now on Java.net under the terms of the GPL. I’ve started on dragging it into the 21st century. The first thing that had to go was the dependency on Java 1.1. I’ve converted everything to use Java 5.0 collections and language features and made the code a bit more OO (it’s now not so much C-in-disguise as it was before). The next task is to replace the ugly and rather basic AWT GUI with Swing.

Anyone wanting to join the effort of improving the mediocre code (no unit tests) or the rather sparse documentation is welcome to request a role on the project via Java.net.

Why are you still not using Hudson?

Posted in Java, Software Development by Dan on May 9th, 2008

This week Hudson was awarded the Duke’s Choice Award in the Developer Solutions category at JavaOne.

In the space of a couple of years, Hudson has come from nowhere to become the leading contender among Continuous Integration servers. It’s head and shoulders above the other free alternatives, and arguably at least as good as the commercial offerings.

The venerable Cruise Control led the way for many years, but it suffers for having been first. Configuring builds is a lot more work than it needs to be. Continuum improved on this by allowing everything to be controlled via the web interface. Continuum is simple and useful. For a while I used it and I was happy.

Then JetBrains gave away free TeamCity licences with IntelliJ IDEA 6.0 and opened my eyes to a world beyond the fairly basic functionality of Continuum. I was impressed (pre-tested commits are a neat feature), but because you needed licences for every user, I was never able to fully commit to it.

NOTE: JetBrains have since introduced a free professional edition of TeamCity.

Anyway, at some point last year I took a serious look at Hudson. I’d been vaguely aware of it for a little while but never been compelled to try it out.

Hudson is impressive. It is ridiculously easy to install. It has the same ease-of-configuration that makes Continuum so simple, but it combines it with high-end features, such as distributed builds, that are usually only found in commercial offerings like TeamCity and Atlassian’s Bamboo.

Hudson is primarily the work of Sun Microsystem’s Kohsuke Kawaguchi. Kohsuke is a prolific Open Source developer. With Hudson he has designed a system with well thought-out extension points that have enabled an army of plug-in writers to deliver a bewildering array of capabilities.

Out-of-the box Hudson supports CVS and Subversion repositories. Plug-ins extend this list to include Git, Mercurial, Perforce, ClearCase, BitKeeper, StarTeam, Accurev and Visual SourceSafe. Hudson also supports pretty much any type of build script (including Ant, Maven, shell scripts, Windows batch files, Ruby, Groovy and MSBuild).

In addition to e-mail and RSS, Hudson can also notify you of build events via IRC and Jabber as well as via a system tray/dock applet. Of course, all of these were too mundane for Kohsuke, so he built his own glowing orb.

But Hudson is much more than just a Continuous Integration server. It’s a complete build management and tracking solution. As well as publishing Javadocs, archiving build artifacts, and monitoring and graphing JUnit/TestNG results over time, you can also track and plot code coverage (Cobertura, EMMA and Clover are all supported) and coding violations (via FindBugs, Checkstyle and/or PMD). And if all that’s not enough, you can play the Continuous Integration game.

So why are you still not using Hudson?

A Java Programmer’s Guide to Random Numbers, Part 3: Seeding

Posted in Java by Dan on April 10th, 2008

The trilogy concludes.

This is the last of three articles about random numbers in Java programs. The series covers the following topics and also serves as an introduction to the random numbers package of the Uncommons Maths library:

  • True Random Numbers and Pseudorandom Numbers
  • Statistical Quality (making sure the dice are not loaded)
  • Performance (because sometimes you really do need half a million random numbers a second)
  • Different kinds of randomness (because not everything can be modelled by tossing a coin)
  • Degrees of Freedom (is every possible outcome actually possible?)
  • Security (when not losing money depends on nobody knowing what happens next)

Part 3: Seeding

Part 1 of this series discussed different kinds of random number generators (RNGs), highlighted the issues with using the default Java RNGs (java.util.Random and java.security.SecureRandom) and introduced the Uncommons Maths library, which provides three fast, high-quality alternative Java RNGs. Part 2 showed how to generate random numbers from different probability distributions using the classes provided by Uncommons Maths.

One aspect of pseudorandom number generators (PRNGs) that we glossed over in part 1 is the issue of seeding. In this article, we’ll look at two important issues related to the seeding of PRNGs.

Degrees of Freedom

Suppose we want to shuffle a standard deck of 52 playing cards. The Java API provides an implementation of a very good algorithm for shuffling a list (see Collections.shuffle). It even provides a variant of this method that allows us to specify our own RNG to be used. So shuffling a deck of cards (assuming it is represented as a List) is easy in Java.

Unfortunately, even though the algorithm is a perfect shuffle and we can choose an RNG that exhibits no bias, our shuffling code is still likely to be flawed in one respect. The problem is that the RNG is almost certainly not able to generate every possible outcome at each state transition. There are many orderings of the deck that cannot occur. The RNG has insufficient degrees of freedom. To understand why this is, we need to consider how many ways there are of ordering a deck of 52 cards. I’ll spare you the maths and tell you that the answer is 52! (that’s 52 factorial) or, in longhand, 80,658,175,170,943,878,571,660,636,856,403,766,975,289,505,440,883,277,824,000,000,000,000.

That’s a pretty big number.

Recall from the first article in this series that when we seed a PRNG we get the same sequence of output each time that we use the same initial seed. The maximum number of possible initial states for the PRNG is equal to the number of possible seed values. If each RNG state gives us a different shuffle, the number of different deck orderings we can generate is determined by the range of the RNG seed.

The CellularAutomatonRNG is very fast, but it only uses a 32-bit seed. So using it for shuffling restricts the result of any given shuffle to one of 4,294,967,296 outcomes. That’s 4.3 billion possibilities but it may as well be zero given the number of outcomes that should be possible but aren’t.  [Note that a second shuffle may well allow a different set of 4,294,967,296 permutations than the first (so over time a larger number of outcomes are possible) but for any nth shuffle the number of possible outcomes is restricted by the number of starting states.]

We would probably want to avoid using java.util.Random but, if we did use it, we’d give it a 64-bit long as its seed. The algorithm only actually uses 48 of those bits for seeding so that gives us 281,474,976,710,656 possible initial RNG states. 281 trillion is an improvement, but still woefully inadequate.

The MersenneTwisterRNG uses a 64-bit seed. That’s 18,446,744,073,709,551,616 possible outcomes for any given shuffle. Even this huge number is a miserable, tiny, insignificant fraction of one percent of the full set of 52! possibilities that we ought to be able to generate.

SecureRandom may use different algorithms internally, depending on the platform. On Windows, the default is the SHA1PRNG provider, which gives us a 160-bit seed. Ignoring SecureRandom’s slowness, this is getting us closer to where we want to be (if you can call less than one trillionth of one percent of the possible outcomes ‘close’).

One more RNG…

It turns out that we need a seed of at least 226 bits in order to achieve the necessary degrees of freedom. This rules out most PRNGs that you might otherwise consider. There is at least one option however: Uncommons Maths’ AESCounterRNG. The AESCounterRNG is based on the AES block cipher. AES is the Advanced Encryption Standard adopted by the US government. The characteristics of AES that make it a good encryption algorithm also coincidentally make it a good RNG. When we use AES as an RNG, the seed is the encryption key. AES supports 128-bit, 192-bit and 256-bit keys.

Using AESCounterRNG with a 256-bit seed, we have the necessary degrees of freedom so that all 52! orderings are possible outcomes of a shuffle. AESCounterRNG, while still much faster than java.security.SecureRandom, is not quite as fast as the other RNGs we’ve been looking at. It does however have other advantages that we shall see later.

Note: Using encryption keys longer than 128 bits in Java requires that you obtain and install the unlimited strength cryptography policy files from Sun.

Security

Before you rush off to set up your new online casino, there’s one further issue to be considered. Remember that pseudorandom number generators are deterministic. That means that if we know which PRNG algorithm is in use and we know what seed was used, then we can predict with absolute certainty every single value that the RNG will ever produce. No problem for non-sensitive applications, but if people are gambling on the outcomes, you’d better be certain that they don’t have this kind of inside information.

Forget about seeding your RNG from the system clock – your seed data needs to be random itself or it is vulnerable (see How We Learned to Cheat at Online Poker: A Study in Software Security for a real world account of how RNG weaknesses can be exploited in online poker). Even if an attacker doesn’t know the seed, with many algorithms it can be reversed engineered by observing the RNG’s output and applying a little computing power.

To address these security concerns, the first step is to pick a PRNG implementation that is not susceptible to reverse-engineering. That brings us back to the AESCounterRNG. Being based on AES means that reverse-engineering the seed would involve cracking the cipher. Good luck doing that in a timely manner.

Disclaimer: You should not rely on the AESCounterRNG to address all of your security concerns. A fuller understanding of the issues can be gained by reading Ferguson and Schneier’s Practical Cryptography – particularly the description of their Fortuna PRNG design.

So where do we get the seed data from? We have a bit of a bootstrapping problem. In order to generate good quality, unpredictable random numbers we first need some good quality, unpredictable random data. By default, all of the Uncommons Maths RNGs seed themselves from /dev/random on Unix systems. On Windows systems seed data will be download from random.org and if neither of those strategies is succesful (perhaps the SecurityManager prevents file or network access) then it will fall back on the seeding strategy provided by SecureRandom.generateSeed. However, seeding strategies are pluggable so that you can provide your own. One possible strategy is to use a true hardware RNG to bootstrap the faster PRNG that will generate the actual output.

The End

And that’s pretty much everything I can tell you about using random number generators in Java. If you have any feedback on Uncommons Maths, please use the issue tracker for bugs and suggestions. If you want to know more about pseudorandom number generators, the following books cover the theoretical and practical issues in much more depth, and much more precisely, than I have done.

The Art of Computer Programming Practical Cryptography

A Java Programmer’s Guide to Random Numbers, Part 2: Not just coins and dice

Posted in Java by Dan on April 6th, 2008

More than you ever wanted to know about randomness in Java programs.

This is the second in a series of articles about random numbers in Java programs. The series covers the following topics and also serves as an introduction to the random numbers package of the Uncommons Maths library:

  • True Random Numbers and Pseudorandom Numbers
  • Statistical Quality (making sure the dice are not loaded)
  • Performance (because sometimes you really do need half a million random numbers a second)
  • Different kinds of randomness (because not everything can be modelled by tossing a coin)
  • Degrees of Freedom (is every possible outcome actually possible?)
  • Security and Integrity (when not losing money depends on nobody knowing what happens next)

Part 2: Not just coins and dice

Part 1 of this series discussed different kinds of random number generators (RNGs), highlighted the issues with using the default Java RNGs (java.util.Random and java.security.SecureRandom) and introduced the Uncommons Maths library, which provides three fast, high-quality alternative Java RNGs.

But there is more to the random numbers package than just replacement RNGs. Uncommons Maths also provides convenient classes for working with several different probability distributions, which is pretty much essential for modelling anything other than the most simple random processes.

The Uncommons Maths random numbers demo application (WebStart required) demonstrates many of the concepts discussed in this article. It also serves as an example of the performance differences between different RNGs as discussed in the previous article.

Different Kinds of Randomness

Uniform Distribution

The Uniform Distribution is what you get by default from java.util.Random (and its sub-classes – including the Uncommons Maths RNGs). The discrete uniform distribution says that there are a certain number of possible outcomes and each is equally likely. The obvious example is rolling a dice. You can get a 1, 2, 3, 4, 5 or 6 and no single outcome is more likely than any of the others. If you record the results of 6 million dice rolls you would expect to get pretty close to a million of each number.

The uniform distribution is useful for some problems, such as shuffling a deck of cards, but it is not a good choice for many other scenarios that don’t involve a set of equally likely outcomes.

Normal (Gaussian) Distribution

The Normal Distribution (also known as the Gaussian Distribution) is the familiar bell curve. It is a good model for many real world phenomena. In a normal distribution, the average outcome is more likely than any other outcome. A slightly above or below average outcome is almost as likely, and extreme outcomes, while still possible, are very unlikely.

An example of this would be the distribution of IQ test scores. Most people would achieve somewhere close to an average score (maybe slightly above or below), but a few people would achieve very low scores and a similarly small number would achieve very high scores.

The nextGaussian() method of the java.util.Random class provides rudimentary support for obtaining normally-distributed values. Uncommons Maths’ GaussianGenerator class builds on this, making it easy to create distributions with different means and standard deviations. The standard deviation parameter controls how spread out the values are. A low standard deviation means most values will be very close to the mean (on either side). A high standard deviation increases the likelihood that a value will be a long way from the mean.

GaussianGenerator is a wrapper around a standard RNG (you can use any of the Uncommons Maths RNGs, or even one of the standard Java ones if you so choose). Once you have created a GaussianGenerator by specifying a mean, standard deviation and source RNG, you can call the nextValue() method ad infinitum:

Random rng = new MersenneTwisterRNG();
GaussianGenerator gen = new GaussianGenerator(100, 15, rng);
while (true)
{
    System.out.println(gen.nextValue());
}

If we were to generate thousands of simulated IQ test scores with this code, we would see that the vast majority of scores would be close to the average (100). Around 68% of the values would be within one standard deviation (15) of the mean (i.e. in the range 85 – 115), and approximately 95% would be within two standard deviations (between 70 and 130). These percentages are a well-known property of normal distributions.

The demo shows how the distribution of generated values compares to a theoretical normal distribution (the more values you generate, the closer the fit will be).

Binomial Distribution

The probability of a coin flip coming up heads is 0.5. This is pretty simple – we can just use the uniform distribution to simulate a single coin flip. But what is the probability of getting exactly 7 heads from 10 coin flips? This is where the Binomial Distribution comes in. It provides probabilities for the outcome of a series of trials, where each trial has two possible results – success or failure. The binomial distribution tells us that the probability of obtaining heads 7 times from 10 fair coin flips is about 0.12.

Suppose, for example, we wanted to randomly generate the number of times the number 6 occurred in 100 dice rolls. We could simulate the full series of trials using a uniform RNG, but this is cumbersome:

Random rng = new MersenneTwisterRNG();
int sixes = 0;
for (int i = 0; i < 100; i++)
{
    int diceRoll = rng.nextInt(6) + 1;
    if (diceRoll == 6))
    {
        ++sixes;
    }
}
System.out.println("No. of 6s from 100 rolls is " + sixes);

It is much easier, and more efficient, to use the binomial distribution. If we consider an occurrence of a 6 to be a “success”, then the probability of success is 1/6 (or ~0.1667). If we plug this value and the number of trials (100) into the Uncommons Maths BinomialGenerator class, we can get our answer from a single method call:

Random rng = new MersenneTwisterRNG();
BinomialGenerator gen = new BinomialGenerator(100, 1d/6, rng);
int sixes = gen.nextValue();
System.out.println("No. of 6s from 100 rolls is " + sixes);

Poisson Distribution

Suppose a telephone call centre between 2pm and 4pm in the afternoon receives calls at average of three per minute. What is the probability that we receive exactly five calls in the next minute? This is the kind of question that we can answer with the Poisson Distribution. The Poisson distribution is not dissimilar to the Binomial distribution, as you will see if you plot them. The difference is that the Poisson distribution is concerned with how many independent events occur within a set interval. Whereas values from a binomial distribution have an upper bound determined by the number of trials, there is no upper bound for a Poisson distribution. The probability of five calls in a minute when the average is three is around 0.17 according to the Poisson distribution.

In simulation applications we can use the PoissonGenerator class (its single parameter is the mean) to determine how many times something happens within a given interval.

Exponential Distribution

The Exponential Distribution is related to the Poisson Distribution. Rather than modelling how many events occur in a fixed period of time, it models the period of time between two independent events that occur at a given average rate.

Suppose you wanted to simulate some random event that happened on average 10 times a minute – perhaps you are simulating load for a server. You could simply generate events at the constant rate of exactly one every 6 seconds. But the real world is rarely this metronomic. A more realistic approach would be to generate events randomly, using an exponential distribution (provided by the ExponentialGenerator class) to determine the intervals between events:

final long oneMinute = 60000;
Random rng = new MersenneTwisterRNG();
 
// Generate events at an average rate of 10 per minute.
ExponentialGenerator gen = new ExponentialGenerator(10, rng);
boolean running = true;
while (true)
{
    long interval = Math.round(gen.nextValue() * oneMinute);
    Thread.sleep(interval);
 
    // Fire event here.
}

In a future article: Shuffling the deck – degrees of freedom, seeding and security concerns.

A Java Programmer’s Guide to Random Numbers, Part 1: Beyond java.util.Random

Posted in Evolutionary Computation, Java by Dan on April 3rd, 2008

More than you ever wanted to know about randomness in Java programs.

This is the first in a series of articles about random numbers in Java programs. The series covers the following topics and also serves as an introduction to the random numbers package of the Uncommons Maths library:

  • True Random Numbers and Pseudorandom Numbers
  • Statistical Quality (making sure the dice are not loaded)
  • Performance (because sometimes you really do need half a million random numbers a second)
  • Different kinds of randomness (because not everything can be modelled by tossing a coin)
  • Degrees of Freedom (is every possible outcome actually possible?)
  • Security and Integrity (when not losing money depends on nobody knowing what happens next)

Part 1: Beyond java.util.Random

Random numbers are useful in a wide variety of software applications. They provide a crucial element of uncertainty in an otherwise deterministic world. Without random numbers in computers, the hugely popular online poker industry would not exist, video games would be boring and predictable, iTunes would have no shuffle, cryptography would be much more difficult, and many innovative algorithms, such as those used in artificial intelligence and evolutionary computation, simply couldn’t work.

True Random Numbers and Pseudorandom Numbers

“Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin.” – John von Neumann

Before continuing, it is important to make the distinction between so-called “true” random numbers and pseudorandom numbers. Though it may often seem otherwise to us as programmers, computer systems are infuriatingly deterministic. They are generally incapable of doing things randomly. To get a computer to behave in a way that is truly random, it is necessary to introduce some non-deterministic input. We could have somebody constantly rolling dice or flipping coins and typing in the results, but that is not very practical. A slightly more feasible approach is to construct a device that observes some real world phenomenon that is known to be unpredictable, such as radioactive decay or atmospheric noise. Data extracted from these events can be used a source of entropy for our applications. You could purchase a device that plugs into a serial port or USB port. To access these devices from Java you’d probably have to use C/C++ and JNI. Alternatively, you could get true random numbers indirectly – from an online service such as Random.org or Hotbits.

Since we can get truly unpredictable random numbers from this kind of hardware, why don’t we satisfy all of our randomness requirements in this way? Well the primary problem is throughput. These devices are quite limited in the quantity of randomness they can produce. They simply aren’t fast enough for many uses.

Pseudorandom numbers are not really random at all. They are the result of deterministic mathematical formulae. The best of these algorithms have been devised so that their output is statistically indistinguishable from true random numbers. PRNGs start with a single numeric seed value. The algorithm is applied to this seed to generate the output and a new, updated seed that is used to generate the next value. The mathematics involved is beyond the scope of this article – the definitive guide is the second volume of Donald Knuth’s The Art of Computer Programming.

An interesting property of this approach is that if you always start with the same seed value, you will always get the same sequence of “random” numbers. Though this can occasionally be useful, you would normally strive to avoid using the same seed value in the interests of unpredictability. A simple approach that is sufficient in many cases is to seed the PRNG from the current value of the system clock.

Aside from speed, another advantage of pseudorandom number generators (PRNGs) over true random number generators (TRNGs) is that they are more predictably unpredictable. That is, the statistical properties of PRNGs are more reliable than is the case with TRNGs.

Statistical Quality

xkcd.com

Why not java.util.Random?

The Java API helpfully provides a PRNG that we can use, the java.util.Random class. If you are a Java programmer you will almost certainly have used it at some point. For non-critical random numbers, e.g. adding some unpredictability to a game, it’s fine. It’s even pretty quick. Unfortunately, it’s not random enough – not by the standards required for more serious applications.

The problem of deciding whether a sequence of numbers is random is not an easy one to solve. You can’t simply look at the output and decide that it doesn’t look random enough. After all, if you toss a coin ten times, it could randomly come up heads every time, even though the probability of this sequence is pretty small. To get any kind of meaningful evaluation requires a large sample of the RNG output (perhaps millions of generated values). This sample can then be subjected to sophisticated statistical analysis.

Probably the best known test suite for random number generators is George Marsaglia’s Diehard Battery of Tests of Randomness. Diehard says that java.util.Random is not sufficiently random. But you don’t have to interpret Diehard’s complicated reports to see for yourself. This applet demonstrates it visibly.

Performance

So if not java.util.Random, how about Java’s other RNG, java.security.SecureRandom? SecureRandom is built for cryptography so it is specifically designed to avoid any such flaws. Diehard reports no issues with its output. Unfortunately, this quality comes at a high price – performance. In benchmarks, SecureRandom can be up to 60 times slower at generating random numbers than java.util.Random. This is bearable if you are only generating random values infrequently, but if your program relies on generating random numbers non-stop and as fast as possible (as many simulations do) then it’s a show-stopping bottleneck.

The good news is that, beyond the core API, there are random number generators as fast as (or faster than) java.util.Random with statistical properties as good as SecureRandom. The Uncommons Maths project provides a comprehensive open source random numbers package. Uncommons Maths provides three Random Number Generators, with different properties, for a wide variety of applications. Unlike java.util.Random, each of these RNGs completes the Diehard suite without any problems. Additionally, each of the Uncommons Maths RNGs is a sub-class of java.util.Random, which means any of them can be used as a drop-in replacement with minimal effort.

A good general-purpose PRNG is the MersenneTwisterRNG class. It is a pure Java port of Makoto Matsumoto and Takuji Nishimura’s proven and ultra-fast Mersenne Twister PRNG for C. Even faster is CellularAutomatonRNG, the Java port of Tony Pasqualoni’s experimental cellular automaton PRNG.

In the next article: Beyond dice and coins – using the right probability distribution.

Getters, Setters and the Great Coverage Conspiracy

Posted in Java, Software Development by Dan on April 1st, 2008

A frequent topic of Java-related blogs is whether it is worthwhile to write unit tests for simple getters and setters. This posting that I came across today proposes a reflection-based trick for eliminating much of the work in writing these tests. Maybe this is an improvement over other approaches, but what bothers me most is the motivation for wanting to test getters and setters in the first place.

It seems that many of those advocating unit testing simple getters and setters are driven by a desire to improve their coverage scores with the actual utility of the tests a secondary concern.

Firstly, I should state that I am absolutely in favour of measuring coverage for test suites. In fact, I think it’s pretty much essential. If you are writing automated tests but not measuring code coverage then you are just scratching around in the dark. What’s great about coverage reports, particularly those that show branch coverage as well as line coverage, is that you get to see exactly where your tests are neglecting certain scenarios. Coverage reports can also be useful in highlighting code that is not used and can be removed.

The problem with code coverage is that it only shows where your tests are weak. It does not prove that your tests are good, even if the coverage is 100%. So writing tests with the sole aim of improving the coverage score is merely an exercise in self-deception. It’s the tail wagging the dog.

If you need to add tests for all your getters and setters in order to achieve x% code coverage, where x is some mandated target, there are two questions you need to ask:

  1. Do you have too many getters and setters?
  2. Are you avoiding testing difficult code?

I could go on for pages about the first point. There are far too many getters and setters in most Java code. Too many developers think encapsulation is simply a case of making fields private and providing access to them with getters and setters. It would be better to aim for making fields private and not providing access to them with getters and setters. Favouring constructor-based dependency injection over setter-based DI is something else to consider (although that’s a whole other article in the making…).

How do you know if you have too many getters and setters? Well your coverage reports are a good starting point. If the getters and setters are essential to your application, it will be just about impossible to avoid exercising them indirectly from other tests. If you have good coverage elsewhere but the getters and setters aren’t touched, chances are they aren’t needed. Adding more tests is not the only way of improving your test coverage. Another way is to remove code so that you have less to test.

The second question above is also important. If you require your team to achieve a rigid 75% test coverage target then you are almost guaranteeing that you will get tests for the 75% of the application that is easiest to test. Writing tests for getters and setters helps to fulfil the 75% requirement without needing to think about how to test the difficult bits of the system. Unfortunately, the other 25% is probably the code that really needs testing/refactoring.

For me it’s pretty clear. Don’t write unit tests for getters and setters. Better still, don’t write getters and setters (except where necessary). And don’t confuse test-driven development with coverage-driven development.

Improved HTML reports for TestNG – ReportNG 0.9.5

Posted in Java by Dan on March 24th, 2008

ReportNG 0.9.5 is now available for download. ReportNG provides alternative HTML and XML reporting plugins for the TestNG test framework.

This release addresses all of the most important outstanding issues in the issue tracker. This includes proper handling of tests that use TestNG’s DataProvider feature, better escaping of special characters in XML and HTML output, and information about the test groups relevant to each test suite.

Thanks to everybody who contributed to this release by providing feedback and suggestions for the previous releases. Any new bug reports and enhancement requests can be entered into the issue tracker.

ReportNG 0.9.3 – XHTML and XML Reporting for TestNG

Posted in Java by Dan on March 5th, 2008

The latest version of ReportNG (version 0.9.3), a reporting plug-in for TestNG is available for download.  ReportNG generates XHTML reports and/or JUnit-format XML reports (for importing TestNG results into tools that recognise the JUnit format).

This is a bug-fix release that addresses a couple of problems with escaping of strings in the JUnit XML reporter.

ReportNG Updated – Version 0.9.2

Posted in Java by Dan on February 3rd, 2008

ReportNG, the reporting plug-in for TestNG, is now up to version 0.9.2.  The JUnit XML reporter introduced previously is now complete and there are also a few minor enhancements to the HTML report.

Maven Revisited: Fallacies and usefulness

Posted in Java, Software Development by Dan on February 3rd, 2008

The eternal love/hate debate around Maven has resurfaced again recently, triggered by Howard Lewis Ship’s denouncement of Maven following his experiences on the Tapestry project (InfoQ has a summary of the recent discussions). I’ve touched on this previously with a brief echoing of Charles Miller’s sentiments.

Meanwhile, Don Brown recently announced his one-man crusade to inject some much-needed common sense into how Maven works (just addressing the first 4 items on his list would make things much better). Don followed this up by applying yet more common sense, this time to how Maven works with plugins.

All of this activity and debate prompted me to reconsider exactly what it is about Maven that makes me avoid it wherever possible. It’s certainly not the convention over configuration. Maven’s conventions might not be to everyone’s taste, but the reasoning is sound (I employ a similar convention-over-configuration approach to Ant builds, using macros). The lack of documentation is definitely a contributing factor, and it’s certainly buggier than Ant. But in the end it all comes down to the dependency management and Charles Miller’s inescapable conclusion that it is “broken by design”.

Maven’s fondness for project dependencies scattered across continents fails to take into account many of the 8 Fallacies of Distributed Computing. Even ignoring the trust issues stemming from the outsourcing of your dependency-management, the more dependencies you have, the less likely you are to be able to access all of them at any given point in time. There will be failures beyond your control at some point.

Assuming that all of the repositories are available, you’ve then got to hope that somebody doesn’t upgrade a library that you depend on and introduce a bug into your application. If you haven’t taken the precautions to specify an exact version number, you could be in trouble. And if you have taken the precautions, why persist with this fragile nonsense? Why not just put the JAR files in the version control repository and remove most of the potential failure points? The same question occurred to Don Brown and he’s doing something about it (it’s point number 4 on his list).

So far, so negative. Are there any redeeming features of Maven? Well there is one use case where the dependency management actually makes a fair bit of sense: internal dependencies. By internal dependencies, I mean dependencies within an organisation. If your development team has multiple projects and there are dependencies between them, the Maven approach could be the solution. In this scenario, everything is within your control. The software is written by your developers and hosted on your servers that are managed by your admins.

This dynamic approach to dependency management is more flexible than copying and pasting different versions of source and/or binaries between projects. And if you limit it to internal projects, you can eliminate most of the downsides. Of course, you don’t have to use Maven for this. You may want to look at Ivy instead.

« Older Posts