Distributed Evolutionary Algorithms with Watchmaker and Hadoop

Posted in Evolutionary Computation, Java by Dan on October 1st, 2008

One feature that has been on the TODO list of the Watchmaker Framework for Evolutionary Computation for some time is the ability to distribute the evolution across several machines.  Some time last year I started on a RMI-based solution, but I wasn’t happy with it so I deleted it and put the idea on the back burner while I concentrated on other things.  At some point I wanted to investigate using Terracotta, or possibly Hadoop, to distribute the computations.

However, it’s often the case with Open Source software that somebody smarter comes along and does the hard work for you.  I was delighted to find out today that Abdel Hakim Deneche has been busy integrating Watchmaker with the Apache Mahout project as part of Google’s Summer of Code programme.

I’d never heard of Mahout before.  According to Wikipedia, a Mahout is somebody who drives an elephant.  Apache Mahout is a sub-project of Lucene, the Java text search and indexing engine.  The Mahout project is focused on building scalable machine-learning libraries using Hadoop (presumably where the elephant connection comes in).

I haven’t yet tried using the Mahout software, but it looks like it provides a pretty straightforward way to distribute the fitness evaluations for just about any evolutionary algorithm implemented using Watchmaker.