New Adventures in Software » 2008

Cedric’s Coding Challenge

Posted in Haskell by Dan on June 28th, 2008

Cedric Beust has posted a programming challenge on his blog. Seeing as I have nothing better to do, I thought I’d give it a go in Haskell. Here is a naive brute-force solution that I came up with. I’m sure there are many optimisations that can be made.

import List(nub)
 
-- Predicate that returns True if a number contains no duplicate digits.
noDuplicateDigits :: Int -> Bool
noDuplicateDigits n = length (nub (digits)) == length digits
                      where digits = show n
 
values :: Int -> [Int]
values n = [x | x <- [1..n], noDuplicateDigits x]
 
-- Given a positive integer n, return the number of values between 1 and n that
-- contain no duplicate digits, and the largest gap between two consecutive
-- values in that sequence.
answer :: Int -> (Int, Int)
answer n = (length sequence, gap)
           where sequence = values n
                 gap = maximum (zipWith subtract sequence (tail sequence))

ReportNG 0.9.6 – HTML and XML Reports for TestNG

Posted in Java by Dan on June 24th, 2008

ReportNG version 0.9.6 is now available for download. ReportNG is a plug-in for TestNG that provides improved HTML reports and improved JUnit-format XML reporting.

This is a bug fix release. Issues addressed include:

ISSUE 23 – Inaccurate aggregate timings for tests.
ISSUE 25 – NullPointerException when a test failure is due to a Throwable that has a null message.
ISSUE 26 – Optionally disable escaping of output logs so that HTML can be inserted into reports.

The fix for ISSUE 26 is a system property that will turn off escaping of Strings embedded in the HTML reports:

org.uncommons.reportng.escape-output=false

This means that, with escaping disabled, FEST will work with ReportNG the way it does with the default TestNG reporter (i.e. hyperlinks will be inserted in the HTML report). The default is still for all log output to be escaped. I don’t recommend that you turn the escaping off unless you really need to because it will mess up your reports if you happen to log any strings that contain characters such as ‘<‘, ‘>’ and ‘&’.

Thank you to the ReportNG users who submitted bug reports and patches for these issues.

Teach Yourself with University CS Resources

Posted in Software Development, The Internet by Dan on June 23rd, 2008

Over at DZone, I saw an article titled “Who Needs a Computer Science Degree When There’s Wikipedia?“. It suggests that you can learn as much from Wikipedia as you can by pursuing a formal university education in Computer Science. Sure, Wikipedia can be extremely informative (at least as an initial resource), but a random walk through the Wikipedia jungle could take you anywhere. It’s not a very structured syllabus.

I’ve been through a university CS education. I’m not going to argue the pros and cons of it here. Instead I’m more interested in how to acquire similar knowledge freely via the web. I’m certain that there are better approaches than trawling through Wikipedia (though Wikipedia would remain invaluable for background reading and finding references to more authoritative sources).

For me, the most obvious place to start is the universities themselves. Have a look at the Computer Science department websites and you will find that many of them provide access to course materials for anyone to download. One of the perils of teaching yourself is that you often don’t know what you don’t know. Unlike Wikipedia, the university content will be from a structured course, designed to teach the important stuff and avoid leaving huge blindspots in your knowledge.

Unlike going to university for real, you don’t have to worry about fees, academic records or geography. You get to pick from the best universities worldwide to provide your education. Leading the way is MIT and their Open Courseware program. This provides high quality content tailored for remote learning. But there are many other universities that provide access to lecture notes (or videos) and exercises.

I was thinking how useful it would be if there was a website that collated links to the best online CS course materials. Then, quite by accident, I stumbled across Google’s CS Curriculum Search. This is a Google web search restricted to the CS departments of universities. It categorises the results into “Lectures”, “Assignments” and “Reference”. It seems to be a very useful tool.

The Curriculum Search is part of the Google Code University, which includes their own content related to CS topics that are important to them (e.g. distributed computing and web security).

Another resource that may prove useful is Scholarpedia, which I have mentioned before.

Java Archaeology: Revisiting 20th Century Code

Posted in Java by Dan on June 20th, 2008

Java 1.1 was released in 1997. How does code from this era compare to today’s Java code?

Following my decision to resurrect an old project, I was faced with the prospect of converting the codebase from Java 1.1 source code to something a little more civilised. The oldest parts of this software date back nearly 9 years. Even then Java 1.1 was a little old, with 1.2.2 being the latest and greatest. Though I re-wrote much of the code 4 or 5 years ago, when Java 1.4.2 was state-of-the-art, it still targeted Java 1.1 VMs for compatibility reasons (i.e. Microsoft’s VM that used to be bundled with IE).

Trawling through this code was an interesting look back to a time when Java developers could choose to use any type of collection as long as it was Vector, and pretty much the only GUI toolkit available was AWT. It was also interesting to revist the implementation decisions made back when I didn’t know nearly as much as I thought I did (though, to be honest, that’s probably still the case).

Pre-historic Collections

The first thing I noticed was the clunky use of collections and arrays in the code. Not only did this software pre-date generics, being tied to Java 1.1 meant that even the Collections Framework was off-limits. Arrays, Vectors and Hashtables were all there were.

My original code used arrays wherever a fixed length was acceptable and Vectors everywhere else. Preoccupied by the difficulty of keeping track of the type of the Vector’s contents, I had written many of the methods so that they converted their internal Vectors into arrays before returning. In modernising the code, I had similar problems remembering what types went in many of the collections that I had declared several years before. Don’t let anyone tell you that generics were not worth it. They’re invaluable for communication of intent alone.

On a similar note, the richer variety of collections that arrived with Java 2 allow for code to be written that more accurately conveys the thinking of the developer. The same method that returned an array or Vector in the legacy code could be rewritten to return List, Set, SortedSet, or just Collection. Each of these options communicates a different notion of what to expect from the method.

Sorting

The purists wept as one of the pillars of Computer Science education crumbled before them and a thousand copies of The Art of Computer Programming were consigned to the shelf indefinitely. The Collections Framework in Java 2 introduced a general purpose sort method that was good enough for the vast majority of your sorting needs. Rather than worrying about remembering how to implement QuickSort and why it was better than an Insertion Sort or a Bubble Sort, just use the modified Merge Sort that Josh Bloch and his colleagues had helpfully provided. The Comparator interface meant that this one tried-and-tested implementation was applicable to just about any problem.

The first things to go as I brought my code up-to-date were my plagiarised sort method and my custom version of the Comparator interface.

Enumerated Types

The lack of enumerated types in the early versions of Java was a curious oversight. I found my old code littered with integer constants. Effective Java remained unwritten in those days so I was ignorant of the type-safe enum pattern. I compounded my ignorance by writing methods that took three of these integer constants as arguments, each representing a conceptually different type. It was very easy to get the arguments the wrong way round and introduce all manner of bugs.

User Interface

It amazes me that people ever achieved anything worthwhile with just AWT. While I would use my mastery of the GridBagLayout to earn the respect of my peers, the lack of a rich set of GUI controls was very limiting. No tables, trees, tab sets or sliders for a start.

Swing has its critics, but I’m not one of them. There were some performance and look-and-feel issues in the early years, but I think that on the whole they got a lot more right than they got wrong. Now that I am not tied to Java 1.1, converting my project’s UI from AWT to Swing is the next item on the agenda. It will be with great pleasure that I rip out the sluggish and horrific table-simulating code that I built from a mess of GridLayouts and Labels.

[Swing was available as an external library for use with Java 1.1, but I had originally steered clear of this option because of the effect this would have had on the software’s download size]

Tools

When I began this project in 1999, Apache Ant was still a little-known part of the Tomcat project. I think JUnit was probably around then but it was not so well-known (certainly not to me). I wrote the original source code in the venerable PFE, a capable editor but it did not even have syntax-highlighting. I built the code using a Windows 98 batch file. The idea of writing automated unit tests did not even occur to me. These days IntelliJ IDEA, Ant and TestNG are essential to my development efforts.

I have no doubt that people will still be writing Java code in another decade, though how significant it will be, and how far removed from today’s code, remains to be seen. Maybe the rest of us will have all switched to JavaNG?

Software That Won’t Die: FSA goes Open Source

Posted in Java by Dan on June 18th, 2008

Having previously declared it deader than a Norwegian Blue, I’ve decided to resurrect my earliest, and unquestionably most obsolete, public Java project. The Football Statistics Applet (FSA) has been providing client-side league tables and statistics to numerous football websites for 8 or 9 years now.

Applets? Java 1.1? AWT? So last century. But despite that, it continues to be undoubtedly the most popular piece of software that I’ve written, even though I might have preferred one of my other projects to have claimed that tepid distinction.

A server-side solution (probably PHP and MySQL) to the same problem seems much more sensible, but it was born of an age (1999/2000) when even static web hosting accounts were much more expensive than those with server-side processing and databases are today.

So, other than a developer who is not able to let go of one of his creations, why not let it die?

Firstly, applets might just be about to make something of a come-back (possibly? …no? OK then). Sun is finally addressing some of the issues that have made them unpopular and is positioning Java as a viable contender in the RIA space. But even if this new momentum comes to nothing, the core of FSA could be useful in other applications. Maybe a servlet-based web-app, or it could be converted to a WebStart application.

Secondly, I receive enough mail about FSA to convince me that there are still people interested in seeing it revived. There are a number of minor enhancements and major new features that have been requested over the years.

Previously I avoided open-sourcing FSA because I didn’t feel that the source code represented the best that I was capable of. I was insecure. I didn’t want to be judged as a shoddy programmer. Now I don’t care what you think so FSA is now on Java.net under the terms of the GPL. I’ve started on dragging it into the 21st century. The first thing that had to go was the dependency on Java 1.1. I’ve converted everything to use Java 5.0 collections and language features and made the code a bit more OO (it’s now not so much C-in-disguise as it was before). The next task is to replace the ugly and rather basic AWT GUI with Swing.

Anyone wanting to join the effort of improving the mediocre code (no unit tests) or the rather sparse documentation is welcome to request a role on the project via Java.net.

The Database State: 42 Days? Try 149 Trillion Years

Posted in The Real World by Dan on June 12th, 2008

The big story in UK in the last couple of days has been the vote in the House of Commons on increasing the maximum period that somebody can be detained on suspicion of terrorism without charge to 42 days. When the Labour Party came to power in 1997 you could not be held without charge for more than 7 days. Under Tony Blair this was doubled to 14 days. It was later doubled again, to 28 days, but only after an attempt to extend it to 90 days was narrowly defeated.

Yesterday the government’s bill to extend detention without charge to 42 days was passed by the House of Commons (though it may yet be rejected by the House of Lords or ruled illegal by the European Court of Human Rights). This prompted prominent opposition MP David Davis to resign from Parliament this morning (more on him later).

Throughout the many debates on this issue over the last few years the justification in favour of the increase has been that terrorist plots are becoming ever more complex. We are constantly told about cases that require the police to examine hundreds of computers and thousands of CDs in the search for evidence. The extra time is needed, so we’re told, to allow police to access these “encrypted” files. The Prime Minister himself mentioned encryption in his press conference this morning (towards the end of the video clip embedded on that page):.

“…certainly involving encrypted computers and everything else… that they will need more time to deal with that.” – Gordon Brown

Any terrorist who allows their encryption to be cracked by the police within 42 days was not paying attention at terrorist school. The real world is not like a badly written crime drama where the stereotypical nerd cracks the bad guys’ encryption in less than an hour using a desktop PC and a 3D screensaver. Even a 128-bit AES key would take trillions of years to crack:

16. What is the chance that someone could use the “DES Cracker”-like hardware to crack an AES key?

In the late 1990s, specialized “DES Cracker” machines were built that could recover a DES key after a few hours. In other words, by trying possible key values, the hardware could determine which key was used to encrypt a message.

Assuming that one could build a machine that could recover a DES key in a second (i.e., try 2⁵⁵ keys per second), then it would take that machine approximately 149 thousand-billion (149 trillion) years to crack a 128-bit AES key. To put that into perspective, the universe is believed to be less than 20 billion years old.

So it seems to me that the debate should be about detaining suspects without charge for up to 149 trillion years (and let’s just hope that the terrorists don’t think to use 256-bit keys). Anything less would be an ineffective compromise. David Davis touched on this in his resignation speech:

“…because the generic security arguments relied on will never go away – technology, development and complexity and so on, we’ll next see 56 days, 70 days, 90 days…” – David Davis

David Davis is a Computer Science graduate, so he is probably more aware than other MPs of the absurdity of the idea of detaining suspects while their encryption is cracked.

Davis has chosen to wage war on the Labour government’s approach to civil liberties. In his criticism of the national DNA database and proposed ID cards he coined the term “The Database State“. I think this is a phrase we will be hearing a lot more of in the coming months. Especially given recent failures to protect sensitive data.

Great Innovations in Computing

Posted in Software Development by Dan on June 8th, 2008

Jeff Atwood poses the question “What do you think the single greatest invention in computer science is [other than the computer itself]?”

Of course, all of this depends on how strictly you define Computer Science and whether or not you include innovations that could more accurately be classified as Software Engineering or Electronic Engineering achievements.

“Computer science is no more about computers than astronomy is about telescopes.” – Edsger Dijkstra

Jeff himself, quoting Steve McConnell, opts for the “routine” (as in procedure/method/function) as the greatest invention. The comments on Coding Horror suggest alternatives, with the compiler being perhaps the most compelling contender.

My vote goes elsewhere. A strong case could be made for the Internet (in general, or the World Wide Web specifically) as the greatest invention. But, like Jeff, I’m going for something much more fundamental. The question is framed to exclude the computer itself, but I would identify one particular refinement to the design of the modern computer as being arguably the most significant milestone in Computer Science. Or perhaps more accurately, it is an advance in the design of the machines that allow practical application of a science that Dijkstra suggested did not really need machines at all.

In February 1946, The University of Pennsylvania unveiled the ENIAC, the first general-purpose, Turing-complete electronic computer. It was programmed by manipulating switches and cables. By hand. The ENIAC project had been in progress since 1943 and, by the time of its unveiling, work was already underway on its successor, the EDVAC. In June 1945, John von Neumann distributed a document entitled First Draft of a Report on the EDVAC. This 10-page document, written before the end of the Second World War, contained the outline of pretty much every general-purpose computer built since. The design of the EDVAC differed from the ENIAC in two fundamental ways. Firstly, it used binary numbers rather than decimal. Perhaps even more significantly, it was to be a stored-program computer.

By defining an instruction set and treating computations as sequences of these instructions, stored in the machine’s memory just like data, a stored-program computer is considerably more flexible than previous hard-wired machines.

Like all the best ideas it seems fairly obvious in hindsight, but the implications of this innovation are difficult to overstate. As well as the powerful flexibility it offered, the advent of the stored-program computer meant that it was now possible to write programs that wrote programs. This was an advance that would eventually lead to that other great computing invention: the compiler. Code was data.

The stored-program approach came to be known as the von Neumann architecture, though this ignores the contributions of von Neumann’s collaborators and contemporaries. Many of the ideas were reportedly established before von Neumann became involved in the project. Nor was the EDVAC the first working stored-program computer. That honour goes to the University of Manchester’s Small-Scale Experimental Machine (SSEM or “Baby”), which became operational in June 1948.

New Adventures in Software by Dan Dyer