Java 6 for 32-bit Macs…finally?

Posted in Java, Mac by Dan on August 25th, 2009

Apple’s OS X 10.6, code-named Snow Leopard, is released on Friday.  There is some suggestion that this will finally deliver Java 6 for 32-bit Intel Macs (more than two-and-a-half years after it debuted on other platforms). The news reaches me via James at DZone, who cites Axel’s blog, which in turn links to this 2-month-old post as evidence. There’s no primary source identified and, given Apple’s legendary pre-release silence, this is unlikely to be confirmed until some Java developer with a 32-bit Mac actually tries the Snow Leopard upgrade.

At present there are two not-entirely-satisfactory options for Java 6 development on 32-bit Mac hardware. The first is to use SoyLatte, which is fine for non-GUI work but only supports Swing under X11. The other option is to run the JVM under another OS via the magic of Parallels or VirtualBox.

Assuming that this rumour is true (and I remain sceptical), the key question is will this update be made available to Tiger and Leopard users via Software Update, or is an OS upgrade necessary? The Leopard-to-Snow-Leopard upgrade is reasonably priced but Apple’s site implies that if you are upgrading from an earlier version your only option is the more expensive Mac Box Set (which also includes the latest versions of iLife and iWork).

UPDATE (28th August): It seems that the Snow Leopard “upgrade” is actually a full version of the Operating System and can be used to upgrade machines running Tiger. However, to do so might be a breach of the End User Licence Agreement.

UPDATE (29th August): I asked on Stack Overflow whether anybody could confirm the presence of Java 6 on 32-bit Macs.  The question got bounced to the new Super User site, but I did get a couple of positive responses.  So it seems that yes, Java 6 is finally available to owners of 32-bit Macs, but only if you upgrade to Snow Leopard.

Watchmaker Framework for Evolutionary Computation – Version 0.6.1: Terracotta Clustering and more…

Posted in Evolutionary Computation, Java by Dan on August 3rd, 2009

I’ve just uploaded version 0.6.1 of the Watchmaker Framework for Evolutionary Computation.  If you’re not already familiar with the project, it is a library for implementing evolutionary/genetic algorithms in Java.  It’s multi-threaded, cross-platform, fast and has a modern, unobtrusive and flexible API.

API Improvements

One user-requested addition to the API in this release is the getSatisfiedTerminationCondtions method.  This makes it easy to determine which termination condition (elapsed time, generation count, stagnation, etc.) caused the evolution to terminate when you are using multiple termination conditions.

The API documentation has also been improved in a few places to make things clearer. Firstly, the framework does not support negative fitness scores.  In previous releases it may have worked under some circumstances, but it was undefined behaviour.  In this release you will get an IllegalArgumentException if you try it.

Secondly, if you are using an EvolutionObserver to update a Swing GUI, be careful not to overwhelm the AWT thread with updates (this can happen if you are processing dozens or hundreds of generations per second). It appears that the framework is so fast that AWT can’t keep up.  If this happens (it’s more likely with small population sizes), the GUI will become sluggish and unresponsive.  This problem is mitigated by minimising the work that your EvolutionObserver does on the AWT thread or by only updating the GUI every nth generation.

Distributed Fitness Evaluations with Terracotta

There have also been some internal modifications to make the framework more amenable to clustering with Terracotta.  Terracotta can now be used to distribute the workload across multiple machines.  It’s only at the proof-of-concept stage at the moment – there is no support for handling node failures.  It’s also only really worthwhile for evolutionary programs that have expensive fitness functions.  The fitness function has to be expensive enough to justify the cost of transferring the candidate across the network for evaluation on another machine, otherwise clustering just makes things slower.

I will likely provide more detail on how to use Watchmaker with Terracotta in a future article, but for now here’s what to do if you want to try it out.  The field that you need to configure Terracotta to share is the  private workQueue field in the org.uncommons.watchmaker.framework.FitnessEvaluationWorker class. Run your unmodified program using Terracotta then run extra instances of the FitnessEvaluationWorker on other nodes.

Remember, you also have the option of using Hadoop with Watchmaker (via the Apache Mahout project).

See the changelog for full details of changes in this release.

Understanding PHP – A Journey into the darkness…

Posted in PHP by Dan on July 31st, 2009

I knew PHP was a bit crufty before I got seriously involved with it. I’ve been trying to avoid writing a rant about how horrible it is as the web has enough of those already and, after all, it doesn’t really matter, does it? Still, to maintain my sanity I’ve been maintaining a list of everything that’s bad about PHP, mostly for my own amusement. However, the most recent entry on my list cannot be allowed to pass without comment.

"01a4" != "001a4"

We start with something simple and non-controversial. If you have two strings that contain a different number of characters, they can’t be considered equal. The leading zeros are important because these are strings not numbers.

"01e4" == "001e4"

However, PHP doesn’t like strings. It’s looking for any excuse it can find to treat your values as numbers. And here we have it. Change the hexadecimal characters in those strings slightly and suddenly PHP decides that these aren’t strings any more, they are numbers in scientific notation (PHP doesn’t care that you used quotes) and they are equivalent because leading zeros are ignored for numbers. To reinforce this point you will find that PHP also evaluates "01e4" == "10000" as true because these are numbers with equivalent values.  This is documented behaviour, it’s just not very sensible.

Enter ===

At this point the PHP apologists chime in with the suggestion to use the === operator. This is an equality operator that compares not only the values of the arguments but their types as well.  Both sides must have the same type as well as identical values. This doesn’t seem like it should make any difference as the literals on both side of the comparison already have identical types, regardless of whether that type is string or integer. Of course that’s not the case and when you use the extra equals sign the values remain as strings rather than being interpreted as integers. "01e4" === "001e4" evaluates to false (correct, but not entirely convincing).

"0x001a4" == 0x01a4

So it seems that the rule in PHP is that if the contents of a string can be parsed as a numeric literal then, for comparisons, they are, as we see with the above hexadecimals (note the difference in notation from the first example, specifically the use of the 0x prefix). Leading zeros are ignored when numbers are involved.

"0012" != 0012

Unfortunately that’s not the full story as the final example shows. Like many other languages, PHP interprets numbers beginning with a zero as octal values, but not when that number is within a string. This is completely inconsistent with the way it processes hexadecimal values and scientific notation within strings.

10 Tips for Publishing Open Source Java Libraries

Posted in Java, Software Development by Dan on July 29th, 2009

One of the strengths of the Java ecosystem is the huge number of open source libraries that are available.  There are often several alternatives when you need a library that provides some specific functionality.  Some library authors make it easy to evaluate and use their libraries while others don’t.  Open source developers may not care whether their libraries are widely used but I suspect that many are at least partially motivated by the desire to see their projects succeed.  With that in mind, here’s a checklist of things to consider to give your open source Java library the best chance of widespread adoption.

1. Make the download link prominent.

If other people can’t figure out how to download your project, it’s not going to be very successful. I’m bemused by the number of open source projects that hide their download links some place obscure. Put it in a prominent location on the front page. Use the word “download” and use large, bold text so that it can’t be missed.

2. Be explicit about the licence.

Potential users will want to know whether your licensing is compatible with their project. Don’t make users have to download and unzip your software in order to find out which licence you use. Display this information prominently on the project’s home page (don’t leave it hidden away in some dark corner of SourceForge’s project pages).

3. Prefer Apache, BSD or LGPL rather than GPL.

Obviously you are free to release your library under any terms that you choose. It’s your work and you get to decide who uses it and how. That said, while the GPL may be a fine choice for end user applications, it doesn’t make much sense for libraries. If you pick a copyleft licence, such as the GPL, your library will be doomed to irrelevance.  Even the Free Software Foundation acknowledges this (albeit grudgingly), hence the existence of the LGPL.

The viral nature of the GPL effectively prevents commercial exploitation of your work.  This may be exactly what you want, but it also prevents your library from being used by open source projects that use a more permissive licence.  This is because they would have to abandon the non-copyleft licence and switch to your chosen licence. That isn’t going to happen.

4. Be conservative about adding dependencies.

Every third-party library that your library depends on is a potential source of pain for your users. They may already depend on a different version of the same library, which can lead to JAR Hell (such problems can be mitigated by using a tool such as Jar Jar Links to isolate dependencies). Injudicious dependencies can also greatly increase the size of your project and every project that uses it.  Don’t introduce a dependency unless it adds real value to your library.

5. Document dependencies.

Ideally you should bundle all dependent JARs with your distribution. This makes it much easier for users to get started. Regardless, you should document exactly which versions of which libraries your library requires. NoClassDefFoundError is not the most friendly way to communicate this information.

6. Avoid depending on a logging framework.

Depending on a particular logging framework will cause a world of pain for half of your users. Some people like to use Sun’s JDK logging classes to avoid an external dependency; and some people like to use Log4J because Sun’s JDK logging isn’t very good. SimpleLog is another alternative.

If you pick the “wrong” logging framework you force your users to make a difficult choice.  Either they maintain two separate logging mechanisms in their application, or they replace their preferred framework with the one you insisted that they use, or (more likely) they replace your library with something else.

For most small to medium sized libraries logging is not a necessity. Problems can be reported to the application code via exceptions and can be logged there.  Incidental informational logging can usually be omitted (unless you’ve written something like Hibernate, which really does need trace logging so that you can figure out what is going on).

7. If you really need logging, use an indirect dependency.

OK, so not all libraries can realistically avoid logging.  The solution is to use a logging adapter such as SLF4J.  This allows you to write log messages and your users to have the final say over which logging back-end gets used.

8. Make the Javadocs available online.

Some libraries only include API docs in the download or, worse still, don’t generate it at all.  If you’re going to have API documentation (and it’s not exactly much effort with Javadoc), put it on the website. Potential users can get a feel for an API by browsing its classes and methods.

9. Provide a minimal example.

In an ideal world your library will be accompanied by a beautiful user manual complete with step-by-step examples for all scenarios. In the real world all we want is a code snippet that shows how to get started with the library. Your online Javadocs can be intimidating if we don’t know which classes to start with.

10. Make the JAR files available in a Maven repository.

This one that I haven’t really followed through on properly for all of my projects yet, though I intend to. That’s because I don’t use Maven, but some people like to. These people will be more likely to use your library if you make the JAR file(s) available in a public Maven repository (such as Java.net’s). You don’t have to use Maven yourself to do this as there is a set of Ant tasks that you can use to publish artifacts.

Programming the Semantic Web and Beautiful Data

Posted in Books by Dan on June 27th, 2009

As I’ve mentioned previously, I’m a big fan of Toby Segaran‘s book Programming Collective Intelligence. It introduces several cutting-edge algorithms for building intelligent web applications through a well chosen set of compelling example programs . A different author might have made the book a dull, overly mathematical ordeal but Segaran manages to inspire the reader to find ways to apply these exotic techniques in their own projects. I was therefore interested to discover that he has since collaborated on two new books that will both be released in July.

Programming the Semantic WebFor Programming the Semantic Web, Segaran has teamed up with Colin Evans and Jamie Taylor. I was unable to find a table of contents for this book but the publisher’s blurb suggests that it will follow the same pragmatic, hands-on formula that worked so well for Programming Collective Intelligence:

With this book, the promise of the Semantic Web — in which machines can find, share, and combine data on the Web — is not just a technical possibility, but a practical reality. Programming the Semantic Web demonstrates several ways to implement semantic web applications, using existing and emerging standards and technologies. With this book, you will learn how to incorporate existing data sources into semantically aware applications and publish rich semantic data.

Programming the Semantic Web will help you:

  • Learn how the semantic web allows new and unexpected uses of data to emerge
  • Understand how semantic technologies promote data portability with a simple, abstract model for knowledge representation
  • Be familiar with semantic standards, such as the Resource Description Framework (RDF) and the Web Ontology Language (OWL)
  • Make use of semantic programming techniques to both enrich and simplify current web applications
  • Learn how to incorporate existing data sources into semantically aware applications

Each chapter walks you through a single piece of semantic technology, and explains how you can use it to solve real problems. Whether you’re writing a simple “mashup” or a maintaining a high-performance enterprise solution, this book provides a standard, flexible approach for integrating and future-proofing systems and data.

Beautiful DataToby has clearly been keeping himself busy because he’s also found time to co-edit the latest installment in O’Reilly’s Beautiful Code series. In 2007 the original Beautiful Code book presented an eclectic mix of 33 essays about elegance in software design and implementation, each written by a different well-known programmer. The success of this anthology has resulted in O’Reilly issuing three companion volumes in 2009: Beautiful Architecture, Beautiful Security and the forthcoming Beautiful Data: The Stories Behind Elegant Data Solutions (edited by Toby Segaran and Jeff Hammerbacher).

Beautiful Data follows the same format as the other books in the series, with each chapter authored by different expert practitioners. One of these chapters covers the making of the video for Radiohead’s House of Cards single, while another is about data processing challenges faced by NASA’s Mars exploration program.

Clearly I haven’t read either of these books because they are not available yet, so I can’t make any informed recommendations, but they do both look like they could be interesting.

Opera Unite Divides Opinion

Posted in The Internet by Dan on June 17th, 2009

Opera Software would have you believe that yesterday they reinvented the web.  The launch of their new Opera Unite service has received a decent amount of publicity. By now you’ve probably heard all about it, but if not you can read the details here.

The 10 second summary is that version 10 of Opera’s web browser contains a web server that allows users to serve web content directly from their desktop machines or laptops. However, this description doesn’t really capture the potential of the platform.

Some commentators have dismissed the announcement with a “so what?”. Opera Unite content is only going to be available while the user’s computer is switched on and running Opera and will be constrained by their available upload bandwidth (which often isn’t much thanks to the ‘A’ in ADSL). That doesn’t really cut it when compared to low-cost web hosting packages capable of serving thousands of users, but then the comparison isn’t particularly helpful.

I don’t need Opera Unite to host my personal website from my desktop. I can install and configure Apache, tweak my firewall/router settings and find a solution to dynamic IP address issues. The point is that with Opera Unite, you don’t have to do any of that.  Opera have completely eliminated all of that hassle and in doing so have made web serving accessible to even non-technical users.  But that’s only half of the story. Serving your personal home page via Opera Unite is still sub-optimal. If you want (semi-)permanent web hosting, pay for some cheap PHP hosting or get a WordPress.com account.

If somebody gives you an Opera Unite URL, you shouldn’t expect that resource to be still around tomorrow or next year like you would with a link to Wikipedia. The real value in Opera Unite is in ad hoc sharing and transient collaboration. Things that were possible but bothersome previously are now trivial because you don’t have to worry about server configuration and networking issues.

For example, say I wanted to invite every reader of this blog to join a chat session. I could try to find out which IM clients you all use and try to arrange something via MSN Messenger, Skype or Google Talk. Or I could install and configure my own IRC server. Or I could try to find a third-party server to host the chat room. With Opera Unite I can simply open up my lounge and give you all the URL (regardless of which browser you happen to be using). It just takes a few clicks. The service is transient.  When we’re done, I kick you all out.

In our chat session I might decide to share some photos or other files with you.  I could send them via e-mail or upload them to an FTP server or a service like Flickr, but again it’s simpler with Unite. I just enable the appropriate service and share the URL. You can browse my shared directory and grab what you want directly from my machine. The link probably won’t work tomorrow, but you won’t need it tomorrow. Temporary is fine when it’s this easy.

The other service that I’m already finding useful is the media player, which enables me to remotely play my home MP3 collection from the office. The Unite platform is based on open standards, so it will be interesting to see what other ideas for services people come up with.

Escape Analysis in Java 6 Update 14 – Some Informal Benchmarks

Posted in Java by Dan on May 31st, 2009

Sun recently released update 14 of the Java 6 JDK and JRE.  As well as the usual collection of bug fixes, this release includes some experimental new features designed to improve the performance of the JVM (see the release notes).  One of these is Escape Analysis.

To see what kind of impact escape analysis might have on my applications, I decided to try it on a couple of my more CPU-intensive Java programs.  Escape analysis is turned off by default since it is still experimental.  It is enabled using the following command-line option:

-XX:+DoEscapeAnalysis

Benchmark 1

The first program I tested is a statistical simulation.  Basically it generates millions of random numbers (using Uncommons Maths naturally) and does a few calculations.

VM Switches: -server
95 seconds

VM Switches: -server -XX:+DoEscapeAnalysis
73 seconds

Performance improvement using Escape Analysis: 23%

Benchmark 2

The second program I tested is an implementation of non-negative matrix factorisation.

VM Switches: -server
22.6 seconds

VM Switches: -server -XX:+DoEscapeAnalysis
20.8 seconds

Performance improvement using Escape Analysis: 8%

Conclusions

These benchmarks are neither representative nor comprehensive.  Nevertheless, for certain types of program the addition of escape analysis appears to be another signficant step forward in JVM performance.

Upgrading Obsolete Ubuntu Systems

Posted in Linux by Dan on May 30th, 2009

Ubuntu 7.10 (Gutsy Gibbon) recently reached EOL (end-of-life). Presumably to discourage people from continuing to use a distro that is officially dead, the package repositories for an EOL release disappear from archive.ubuntu.com, which means that if you try to use apt-get to install or upgrade your software you will get 404 errors.

The recommended action for anyone still running 7.10 is to use the do-release-upgrade command to (relatively) painlessly upgrade to a newer, supported version of Ubuntu (remembering to take a backup first, just in case). The one little catch with this solution is that without access to the package repository, you won’t be able to install the upgrade tool if you don’t already have it.

Fortunately, the Gutsy repository hasn’t been removed completely, it’s just relocated to old-releases.ubuntu.com. So if you edit /etc/apt/sources.list and replace all occurrences of archive.ubuntu.com with old-releases.ubuntu.com, you will again be able to access the packages for 7.10. You should then install the update-manager-core package to enable the upgrade.

sudo vi /etc/apt/sources.list
sudo apt-get install update-manager-core

After doing this and before upgrading, it is important to revert the changes made to the sources.list file (i.e. change it back to using archive.ubuntu.com). This is because the distro upgrade will replace all references to ‘gutsy’ with ‘hardy’ (as in Hardy Heron, the Ubuntu 8.04 release) but will not change the repository addresses. Since Hardy is hosted at archive.ubuntu.com, leaving it as old-releases.ubuntu.com will cause the upgrade to fail.

sudo vi /etc/apt/sources.list
sudo do-release-upgrade

If all goes well you will end up with a fully functioning Ubuntu 8.04 system. Hardy Heron is the current LTS (Long Term Support) release. You have the option of a further upgrade to 9.04 (Jaunty Jackalope), but although it is a more recent release, it will reach EOL earlier because there is no long term support commitment for Jaunty. Jaunty will reach EOL in October 2010 whereas Hardy will be supported until April 2011 for desktops and April 2013 for servers.

SICP – The most divisive book in Computer Science?

Posted in Books by Dan on May 28th, 2009

Structure and Interpretation of Computer Programs (universally referred to as SICP) seems to be mentioned whenever people are discussing the great/classic/essential Computer Science books. It typically generates a mixed response. Somebody recently sent a copy (anonymously?) to Python creator Guido van Rossum, apparently as a comment on his supposed ignorance (incidentally, this is an incredibly arsey thing to do). It seems that SICP is a real love-it-or-hate-it kind of book. Depending on who you listen to, it’s either a mind-bending classic through which true enlightenment can be achieved, or it’s dull, obvious and poorly written. The distribution of the reviews for SICP on Amazon (UK) is striking:

Amazon SICP reviews

If you haven’t already read it, you can decide for yourself. The whole thing is available online. I didn’t get very far the one time I started to read it. I quickly got bored with the introductory stuff, but I intend to give it another go sometime. I’ve seen several people recommend the associated video lectures, which may be a better entry point.

Watchmaker 0.6.0 – Evolutionary Computation for Java

Posted in Evolutionary Computation, Java by Dan on April 26th, 2009

Version 0.6.0 of the Watchmaker Framework for Evolutionary Computation is now available for download. This release incorporates several minor changes that I’ve been making over the last few months.  Consult the changelog for full details, but here are the highlights:

Numerous Improvements to the Evolution Monitor and other Swing Components

The Watchmaker Swing library provides a collection of GUI components that simplify the process of building user interfaces for evolutionary programs. These components have received many improvments for version 0.6.0. As well as controls for manipulating evolution parameters while the program is running, the library also provides an Evolution Monitor component. This provides real-time information about the state of the program, including a view of the fittest candidate so far and a graph showing changes in population fitness over time.

Upgraded to Uncommons Maths 1.2

This means even faster RNGs are available for you to use. It also means that we now use the Uncommons Maths Probability class rather than duplicating it in the framework (this means you may have to change some imports in your code when upgrading from Watchmaker 0.5.x).

Caching Fitness Evaluator

Version 0.6.0 introduces the CachingFitnessEvaluator class. This is a wrapper that provides caching for existing FitnessEvaluator implementations. The results of fitness evaluations are cached so that if the same candidate is evaluated twice, the expense of the fitness calculation can be avoided the second time. The cache uses weak references in order to avoid memory leakage.

Caching of fitness values can be a useful optimisation in situations where the fitness evaluation is expensive and there is a possibility that some candidates will survive from generation to generation unmodified. Programs that use elitism are one example of candidates surviving unmodified. Another scenario is when the configured evolutionary operator does not always modify every candidate in the population for every generation.

Caching of fitness scores is provided as an option rather than as the default Watchmaker Framework behaviour because caching is only valid when fitness evaluations are isolated and repeatable. An isolated fitness evaluation is one where the result depends only upon the candidate being evaluated. This is not the case when candidates are evaluated against the other members of the population.

Mona Lisa Example

After seeing Roger Alsing’s evolution of the Mona Lisa, I was inspired to try to reproduce it using the Watchmaker Framework. I didn’t follow Roger’s methodology but I have come up with something similar. My results aren’t as impressive as his latest efforts but may be interesting anyway. This example was actually included in version 0.5.1 but I didn’t draw attention to it. In 0.6.0 I’ve improved performance and used it to demonstrate the Watchmaker GUI components mentioned above.  You can try it for yourself here.  Maybe you can come up with a combination of parameters that works better than the defaults I have provided?

Useful Watchmaker Links

If you are new to Evolutionary Computation in Java,  these previous articles may be of interest:

« Older Posts