Developing with SoyLatte – First Impressions

Posted in Java, Mac by Dan on December 29th, 2007

Apple recently released an updated preview of Java 6, but only for 64-bit Intel Macs running Leopard. With the company remaining stubbornly silent on its future Java plans, nobody knows whether 32-bit Macs or the Tiger operating system will ever be supported.

For those developers who don’t want to be forced to upgrade to Leopard or to discard their perfectly adequate CoreDuo Macs, there is an alternative to playing Apple’s waiting game thanks to the efforts of Landon Fuller and his SoyLatte project.

Getting Started

SoyLatte is an X11-based port of the FreeBSD patchset for Java 6. This gives you a non-Apple JDK solution right now and on your terms (no hardware or software upgrade required). The only thing that SoyLatte lacks is Apple’s polished native look-and-feel for Swing and AWT applications (there is no seamless integration with the OS X desktop). This is because it uses X11 for display purposes (in the same way that the Mac port of OpenOffice works). If you are doing non-GUI development none of this will matter to you.

Download and installation of SoyLatte is straightforward. It’s just an archive, containing the pre-built JDK binaries, that can be dropped anywhere you please (mine is under /usr/local). Change your environment variables (JAVA_HOME, PATH, etc.) as appropriate and you are ready to go for command-line development.

Developing

My reason for requiring Java 6 was to build and hack Hudson. Hudson is Kohsuke Kawaguchi‘s open source, extensible continuous integration server. Hudson will run on Java 5 but it requires Java 6 for development. To make Mac-based development on Hudson feasible I would have to get SoyLatte to play nicely with IntelliJ IDEA and Maven.

Integrating SoyLatte with IDEA was a piece of cake. It’s just a matter of adding a new JDK in the settings dialog (just point it at your installation directory). I did have to go back and manually add the SoyLatte JRE libs to the JDK classpath as these were not picked up automatically (resulting in the IDE not being able to find classes like String and ArrayList), but once that was done everything worked perfectly. Crucially, even though I was using Java 6 classes and tools, I was still running the actual IDE using Apple’s Java 5, so I didn’t have to use X11.

Running Maven from the command line presents no new problems. It will just use whichever JDK JAVA_HOME points to. Unfortunately the IDEA Maven plugin uses whatever the IDE is using and doesn’t allow this to be changed. This is in contrast to IDEA’s Ant support, which allows the JDK to be specified explicitly. The inconsistency is probably because the Ant plugin was written by JetBrains whereas the Maven plugin is a community contribution. Running IDEA itself under SoyLatte would address this issue but, for me at least, using Maven from a terminal is preferable to running IDEA under X11.

On Maven

Posted in Java by Dan on December 20th, 2007

If I ever wanted to write down the issues I have with Maven, I couldn’t do a better job than Charles Miller’s post. Maven solves problems that I didn’t have by introducing me to a whole set of new problems that I also didn’t have.

Ant is certainly not perfect (though you can learn to use it effectively) and I remain convinced that a better solution to Java build problems can be found. However, if Maven is the answer, then we are asking the wrong question.

The Perils of Web Development, the Importance of Testing and Why 95% of the World Couldn’t See My Page

Posted in Software Development by Dan on December 17th, 2007

Write the web page. Check how it looks in browser of choice. Validate the XHTML. Validate the CSS. Double-check in alternative browser. Job done. Surely the worst that can happen in other browsers is that some things are a few pixels out of alignment?

Not quite. Because, despite being well-formed and perfectly valid (it even has a nice W3C badge on the bottom), most of the content was completely invisible in the two most popular web browsers.

The problem is Internet Explorer’s handling of the <script> tag. IE will not let you use a “self-closed” script tag (e.g. <script src=”myscript.js” />). It’s well-formed XML, and perfectly valid XHTML, just like the other self-closed tags IE allows, but IE assumes it is unclosed and therefore ignores the remainder of the content (at least until it hit a </script> tag belonging to another script further down the page). Brilliant. I’ve no idea what the justification for this is and, to be honest, I don’t care. I’d like to curse the ineptitude of Microsoft, change the page and that would be that. But no, I can’t just ignore it because it’s a bloody conspiracy. Firefox behaves in exactly the same way. So the lesson to take from this is: Opera and Safari – sensible; IE and Firefox – insane.

To be honest, I was already aware of this bug, in IE at least, because I had come across it once before (this makes it doubly frustrating). On that occasion I was editing the XHTML with IntelliJ IDEA, which very helpfully highlighted the problematic tag with a warning that it would not work with IE. For this page I used Vim, so no such help.

What is most annoying, and embarrassing, is that this page has been like this for ages (I can’t remember how long), displaying nothing but the Feedburner view of this blog, and a big white space, to most visitors (and sometimes I get literally several hits a week). I admit that the rest of the content is very minimal and not particularly interesting, but I’d prefer it if it was the visitors that chose to ignore it rather than their browsers.

Yes, yes… I know I was asking for trouble by neglecting to test in either of the most popular browsers. I have learned my lesson (until next time at least). Assumption truly is the mother of all fuck-ups.

Google takes on Wikipedia

Posted in The Internet by Dan on December 15th, 2007

The BBC brings news of Google’s plans for an online encyclopedia to rival Wikipedia.

The new project, called Knol, atttempts to address some of Wikipedia’s short-comings by putting more emphasis on respected authors and peer-reviewed content. In exchange for contributing, authors will receive a share of the ad revenue for their pages. Meanwhile, Wikipedia steadfastly refuses to display adverts, and instead relies on charitable donations to cover its costs.

This sounds a lot like the Scholarpedia project that I wrote about previously. But Scholarpedia lacks the considerable backing of the Google machine or the financial incentives of Adsense.

So will Google crush Wikipedia? Will Wikipedia have to adapt to survive? Or is it too entrenched already for Google’s efforts to have any real impact?

Wikipedia’s ad hoc editing certainly results in some interesting articles. During this year’s World Cup I found 3 separate pages detailing rugby player Jonny Wilkinson’s international points-scoring record, each with a wildly different number (including one that put him hundreds of points ahead of all-time record holder Neil Jenkins). Other things Wikipedia has taught me in the last year are that Clash frontman Joe Strummer was in favour of AIDS and global warming (or perhaps it was just a poorly constructed sentence), and that billionaire Chelsea Football Club owner Roman Abramovich is in fact a dustman.

Watchmaker Framework for Evolutionary Computation 0.4.3

Posted in Evolutionary Computation, Java by Dan on December 14th, 2007

This is mostly a maintenance release. Uncommons Maths is now a separate project so the Watchmaker Framework has been modified to use the official version of that library. There are a few other minor tweaks (a couple of classes have been moved around, but nothing in the core framework).

Version 0.4.3 also introduces an experimental EvolutionMonitor component. This a Swing view that gives you some insight into the current state of the population while your evolutionary algorithm is running. In this first version all it does is graph the mean and peak fitness scores (using JFreeChart). Future versions will hopefully display more information (perhaps I will add an API to enable data to be extracted from the population while running). The EvolutionMonitor implements the EvolutionObserver interface so you can hook it up easily by calling the addEvolutionObserver method of your EvolutionEngine.

The other new feature is a new termination condition for terminating the algorithm when the population fitness begins to stagnate. If this condition is used and there is no fitness improvement within a specified number of generations, the evolution engine will assume that no further improvement can be made and will return the fittest individual found so far. This is often a more practical approach than specifying a maximum total number of generations or a fixed time limit in advance.

Book Review: Programming Collective Intelligence

Posted in Evolutionary Computation, Python, Software Development by Dan on December 13th, 2007

It’s called “Programming Collective Intelligence” and is presented as a book for building “Smart Web 2.0 Applications” but it is essentially an extremely accessible explanation of a wide array of machine learning and data-mining algorithms. How do sites like Amazon and Last.FM make recommendations? How do search engines work? How does Google News manage to categorise and present the most important news articles without human intervention? How do you build a useful spam filter?

All of these questions are answered and compelling example applications are built step-by-step to demonstrate the power of the ideas presented here. Decision trees, genetic algorithms, neural networks, support vector machines, genetic programming, Bayesian classifiers and non-negative matrix factorisation are some of the techniques covered and all without the dry, maths-heavy text that normally fills books on these topics.

The examples throughout are exclusively in Python, which may have put me off had I realised this when I ordered it. I have nothing against Python except for my complete lack of experience with it. However, the examples are easy enough to understand for anybody familiar with other high-level languages. As result of reading the book, I may actually try my hand at a bit of Python hacking now.

How well do these techniques work? Well I’d never have found out about this book but for Amazon’s automated recommendations system. I’d thoroughly recommend this book to anyone looking to learn about interesting AI techniques without wading through opaque academic papers.

(If you find the genetic algorithms and genetic programming topics interesting, check out the Watchmaker Framework for Evolutionary Computation and some of the books recommended there.)