10 Tips for Publishing Open Source Java Libraries

Posted in Java, Software Development by Dan on July 29th, 2009

One of the strengths of the Java ecosystem is the huge number of open source libraries that are available.  There are often several alternatives when you need a library that provides some specific functionality.  Some library authors make it easy to evaluate and use their libraries while others don’t.  Open source developers may not care whether their libraries are widely used but I suspect that many are at least partially motivated by the desire to see their projects succeed.  With that in mind, here’s a checklist of things to consider to give your open source Java library the best chance of widespread adoption.

1. Make the download link prominent.

If other people can’t figure out how to download your project, it’s not going to be very successful. I’m bemused by the number of open source projects that hide their download links some place obscure. Put it in a prominent location on the front page. Use the word “download” and use large, bold text so that it can’t be missed.

2. Be explicit about the licence.

Potential users will want to know whether your licensing is compatible with their project. Don’t make users have to download and unzip your software in order to find out which licence you use. Display this information prominently on the project’s home page (don’t leave it hidden away in some dark corner of SourceForge’s project pages).

3. Prefer Apache, BSD or LGPL rather than GPL.

Obviously you are free to release your library under any terms that you choose. It’s your work and you get to decide who uses it and how. That said, while the GPL may be a fine choice for end user applications, it doesn’t make much sense for libraries. If you pick a copyleft licence, such as the GPL, your library will be doomed to irrelevance.  Even the Free Software Foundation acknowledges this (albeit grudgingly), hence the existence of the LGPL.

The viral nature of the GPL effectively prevents commercial exploitation of your work.  This may be exactly what you want, but it also prevents your library from being used by open source projects that use a more permissive licence.  This is because they would have to abandon the non-copyleft licence and switch to your chosen licence. That isn’t going to happen.

4. Be conservative about adding dependencies.

Every third-party library that your library depends on is a potential source of pain for your users. They may already depend on a different version of the same library, which can lead to JAR Hell (such problems can be mitigated by using a tool such as Jar Jar Links to isolate dependencies). Injudicious dependencies can also greatly increase the size of your project and every project that uses it.  Don’t introduce a dependency unless it adds real value to your library.

5. Document dependencies.

Ideally you should bundle all dependent JARs with your distribution. This makes it much easier for users to get started. Regardless, you should document exactly which versions of which libraries your library requires. NoClassDefFoundError is not the most friendly way to communicate this information.

6. Avoid depending on a logging framework.

Depending on a particular logging framework will cause a world of pain for half of your users. Some people like to use Sun’s JDK logging classes to avoid an external dependency; and some people like to use Log4J because Sun’s JDK logging isn’t very good. SimpleLog is another alternative.

If you pick the “wrong” logging framework you force your users to make a difficult choice.  Either they maintain two separate logging mechanisms in their application, or they replace their preferred framework with the one you insisted that they use, or (more likely) they replace your library with something else.

For most small to medium sized libraries logging is not a necessity. Problems can be reported to the application code via exceptions and can be logged there.  Incidental informational logging can usually be omitted (unless you’ve written something like Hibernate, which really does need trace logging so that you can figure out what is going on).

7. If you really need logging, use an indirect dependency.

OK, so not all libraries can realistically avoid logging.  The solution is to use a logging adapter such as SLF4J.  This allows you to write log messages and your users to have the final say over which logging back-end gets used.

8. Make the Javadocs available online.

Some libraries only include API docs in the download or, worse still, don’t generate it at all.  If you’re going to have API documentation (and it’s not exactly much effort with Javadoc), put it on the website. Potential users can get a feel for an API by browsing its classes and methods.

9. Provide a minimal example.

In an ideal world your library will be accompanied by a beautiful user manual complete with step-by-step examples for all scenarios. In the real world all we want is a code snippet that shows how to get started with the library. Your online Javadocs can be intimidating if we don’t know which classes to start with.

10. Make the JAR files available in a Maven repository.

This one that I haven’t really followed through on properly for all of my projects yet, though I intend to. That’s because I don’t use Maven, but some people like to. These people will be more likely to use your library if you make the JAR file(s) available in a public Maven repository (such as Java.net’s). You don’t have to use Maven yourself to do this as there is a set of Ant tasks that you can use to publish artifacts.

Netflix Prize Snatched Away at Last Moment?

Posted in Software Development, The Internet by Dan on July 26th, 2009

30 days ago, the BellKor’s Pragmatic Chaos team submitted the first qualifying solution for the $1 million Netflix prize.  The prize is awarded to the best performing solution 30 days after first submission that achieves the 10% improvement threshold.

BellKor achieved 10.05% on 26th June and have since moved on to 10.08%.  Several teams that were close to the qualifying mark responded by forming coallitions in a frantic race to find a hybrid solution that would surpass BellKor’s mark before the end of the 30 day period.

The Ensemble is one of these super teams.  They achieved the 10% mark two days ago and then today, on the very last day of the competition, they appear to have dramatically snatched the prize with a submission that is just 0.01% better than BellKor’s.

UPDATE: BellKor subsequently submitted an entry that matched the Ensemble’s 10.09% only for the Ensemble to trump that 20 minutes later with a score of 10.10%, 4 minutes before the submissions closed.

UPDATE 2: Simon Owens has posted an interview with one of the members of the winning Ensemble team.

UPDATE 3: The Ensemble themselves have posted an account of the nail-biting final minutes of the competition.

First Qualifying Solution Submitted for $1 Million Netflix Prize

Posted in Software Development, The Internet by Dan on June 26th, 2009

The word on the street (well Reddit actually) is that the BellKor’s Pragmatic Chaos team today submitted the first qualifying solution for the Netflix Prize.  If nobody submits a better solution within the next 30 days then they will claim the $1 million reward that has so far eluded the best efforts of thousands of programmers and researchers since the competition was launched in October 2006.

Netflix is a US-based online DVD rental service.  One of their features is that they make movie recommendations to customers based on their previous viewing history.  In order to improve their recommendations system, Netflix has been offering a million dollar reward to any individual or team that is able to develop software that increases the accuracy of these recommendations by at least 10%.

The financial rewards and intellectual challenge of the Netflix Prize have encouraged almost 50,000 individuals and teams to attempt to solve the problem using a vast array of different AI and data-mining techniques.

The BellKor team have overcome such obstacles as the Napolean Dynamite problem and will no doubt have the champagne on ice while they nervously wait to see if  anybody else is able to surpass their results within the next month.

Practical Evolutionary Computation: Elitism

Posted in Evolutionary Computation, Java, Software Development by Dan on February 12th, 2009

In my previous article about evolutionary computation, I glossed over the concept of elitism. The Watchmaker Framework‘s evolve methods require you to specify an elite count. I told you to set this parameter to zero and forget about it. This brief article ties up that loose end by explaining how to use elitism to improve the performance of your evolutionary algorithm.

In an evolutionary algorithm (EA), sometimes good candidates can be lost when cross-over or mutation results in offspring that are weaker than the parents. Often the EA will re-discover these lost improvements in a subsequent generation but there is no guarantee of this. To combat this we can use a feature known as elitism. Elitism involves copying a small proportion of the fittest candidates, unchanged, into the next generation. This can sometimes have a dramatic impact on performance by ensuring that the EA does not waste time re-discovering previously discarded partial solutions. Candidate solutions that are preserved unchanged through elitism remain eligible for selection as parents when breeding the remainder of the next generation.

NOTE: One potential downside of elitism is that it may make it more likely that the evolution converges on a sub-optimal local maximum.

The Watchmaker Framework supports elitism via the second parameter to the evolve method of an EvolutionEngine. This elite count is the number of candidates in a generation that should be copied unchanged from the previous generation, rather than created via evolution. Collectively these candidates are the elite. So for a population size of 100, setting the elite count to 5 will result in the fittest 5% of each generation being copied, without modification, into the next generation.

If you run the Hello World example from the previous article both with and without elitism, you will see that it completes in fewer generations with elitism enabled (22 generations vs. 40 when I ran it – though your mileage may vary due to the random nature of the evolution).

Source code for the Hello World example (and several other, more interesting evolutionary programs) is included in the download.

This is the third in a short and irregular series of articles on practical Evolutionary Computation, based on the work-in-progress documentation for the Watchmaker Framework for Evolutionary Computation.  The first article provided an introduction to the field of evolutionary computation and the second article showed how to implement a simple evolutionary algorithm using the Watchmaker Framework.

Further Reading

An Introduction to Genetic Algorithms Introduction to Evolutionary Computing The Blind Watchmaker

Waiting for the Magic – The Great Twitter Experiment, Day 1

Posted in Software Development by Dan on February 12th, 2009

Getting Started

So I started a Twitter account.  Signing-up was not without its problems. The sign-up form has some AJAX functionality for checking whether a user name is in use or not. Except that functionality is just absent in Opera, and it turns out that in Twitterland I am not the only person called Dan Dyer. There is not even an error message when the registration fails. You just get returned to the sign-up form. Very annoying. It wasn’t until I tried it in Safari that I figured out what was happening. You are also limited to a 15-character username, so I could not register as “newadventuresinsoftware”.

That initial hurdle overcome, I proceded to post my first “tweet”, a link back to the blog post announcing this little adventure. The translation of the URL to a Tiny URL tripped over the brackets that I used to surround the link and messed up my tweet. Lesson learned for next time.

The use of these shrunk and obfuscated URLs is necessitated by the strict 140 character limit. It’s not done very cleverly though. I tried to post a tweet that would be 140 characters after the link had been shrunk, but the Twitter website would not permit this.

In all, the Twitter website is pants. It’s a bad way to post messages and a bad way to follow other people. If it were the only way to interact with the service I would have aborted my trial already. Most of the Twitter pros are using some kind of desktop or mobile client. I’ve tried using the Opera widget and that’s an improvement. Next step is to settle on one of the more full-featured desktop clients.

Technical Problems

Twitter’s technical issues are legendary. I hadn’t heard much about them recently, so I assumed things had got better. But I’m only a few hours in and I’ve already experienced my first Fail Whale.

Day One Summary

As well as getting signed-up and posting my first tweets, I’ve also attracted my first disciples followers (thanks). I don’t really know who are the best people to follow, so as well as the few people I have picked out, I’m following everybody who follows me. That raises a question though: if I’m following you, and you’re following me, are we both lost?

At the moment I’m underwhelmed by the whole experience, but I didn’t expect to achieve enlightenment on day one. I’m just doing the ground work for the epiphany that will surely occur at some point in the next 2 weeks. Right now I’m just sitting back waiting for the magic to happen.

Practical Evolutionary Computation: An Introduction

Posted in Evolutionary Computation, Software Development by Dan on January 20th, 2009

Software is normally developed in a very precise, deterministic way. The behaviour of a computer is governed by strict logical rules. A computer invariably does exactly what it is told to do.

When writing a program to solve a particular problem, software developers will identify the necessary sub-tasks that the program must perform. Algorithms are chosen and implemented for each task. The completed program becomes a detailed specification of exactly how to get from A to B. Every aspect is carefully designed by its developers who must understand how the various components interact to deliver the program’s functionality.

This prescriptive approach to solving problems with computers has served us well and is responsible for most of the software applications that we use today. However, it is not without limitations. Solutions to problems are constrained by the intuition, knowledge and prejudices of those who develop the software. The programmers have to know exactly how to solve the problem.

Another characteristic of the prescriptive approach that is sometimes problematic is that it is best suited to finding exact answers. Not all problems have exact solutions, and some that do may be too computationally expensive to solve. Sometimes it is more useful to be able to find an approximate answer quickly than to waste time searching for a better solution.

What are Evolutionary Algorithms?

Evolutionary algorithms (EAs) are inspired by the biological model of evolution and natural selection first proposed by Charles Darwin in 1859. In the natural world, evolution helps species adapt to their environments. Environmental factors that influence the survival prospects of an organism include climate, availability of food and the dangers of predators.

Species change over the course of many generations. Mutations occur randomly. Some mutations will be advantageous, but many will be useless or detrimental. Progress comes from the feedback provided by non-random natural selection. For example, organisms that can survive for long periods without water will be more likely to thrive in dry conditions than those that can’t. Likewise, animals that can run fast will be more successful at evading predators than their slower rivals. If a random genetic modification helps an organism to survive and to reproduce, that modification will itself survive and spread throughout the population, via the organism’s offspring.

Evolutionary algorithms are based on a simplified model of this biological evolution. To solve a particular problem we create an environment in which potential solutions can evolve. The environment is shaped by the parameters of the problem and encourages the evolution of good solutions.

The field of Evolutionary Computation encompasses several types of evolutionary algorithm. These include Genetic Algorithms (GAs), Evolution Strategies, Genetic Programming (GP), Evolutionary Programming and Learning Classifier Systems.

The most common type of evolutionary algorithm is the generational genetic algorithm.  The basic outline of a generational GA is as follows (most other EA variants are broadly similar).  A population of candidate solutions is iteratively evolved over many generations. Mimicking the concept of natural selection in biology, the survival of candidates (or their offspring) from generation to generation in an EA is governed by a fitness function that evaluates each candidate according to how close it is to the desired outcome, and a selection strategy that favours the better solutions. Over time, the quality of the solutions in the population should improve. If the program is successful, we can terminate the evolution once it has found a solution that is good enough.

An Example

Now that we have introduced the basic concepts and terminology, I will attempt to illustrate by way of an example. Suppose that we want to use evolution to generate a particular character string, for example “HELLO WORLD”. This is a contrived example in as much as it assumes that we don’t know how to create such a string and that evolution is the best approach available to us. However, bear with me as this simple example is useful for demonstrating exactly how the evolutionary approach works.

Each candidate solution in our population will be a string. We’ll use a fixed-length representation so that each string is 11 characters long. Each character in a string will be one of the 27 valid characters (the upper case letters ‘A’ to ‘Z’ plus the space character).

For the fitness function we’ll use the simple approach of assigning a candidate solution one point for each position in the string that has the correct character. For the string “HELLO WORLD” this gives a maximum possible fitness score of 11 (the length of the string).

The first task for the evolutionary algorithm is to randomly generate the initial population. We can use any size population that we choose. Typical EA population sizes can vary from tens to thousands of individuals. For this example we will use a population size of 10. After the initialisation of the population we might have the following candidates (fitness scores in brackets):

  1.  GERZUNFXCEN  (1)
  2.  HSFDAHDMUYZ  (1)
  3.  UQ IGARHGJN  (0)
  4.  ZASIB WSUVP  (2)
  5.  XIXROIUAZBH  (1)
  6.  VDLGCWMBFYA  (1)
  7.  SY YUHYRSEE  (0)
  8.  EUSVBIVFHFK  (0)
  9.  HHENRFZAMZH  (1)
  10. UJBBDFZPLCN  (0)

None of these candidate solutions is particularly good. The best (number 4) has just two characters out of eleven that match the target string (the space character and the ‘W’).

The next step is to select candidates based on their fitness and use them to create a new generation.  One technique for favouring the selection of fitter candidates over weaker candidates is to assign each candidate a selection probability proportionate to its fitness.

If we use fitness-proportionate selection, none of the candidates with zero fitness will be selected and the candidate with a fitness of 2 is twice as likely to be selected as any of the candidates with a fitness of 1. For the next step we need to select 10 parents, so it is obvious that some of the fit candidates are going to be selected multiple times.

Now that we have some parents, we can breed the next generation. We do this via a process called cross-over, which is analogous to sexual reproduction in biology. For each pair of parents, a cross-over point is selected randomly. Assuming that the first two randomly selected parents are numbers 2 and 4, if the cross-over occurs after the first four characters, we will get the following offspring:

  Parent 1:     HSFDAHDMUYZ
  Parent 2:     ZASIB WSUVP
  Offspring 1:  HSFDB WSUVP
  Offspring 2:  ZASIAHDMUYZ

This recombination has given us two new candidates for the next generation, one of which is better than either of the parents (offspring 1 has a fitness score of 3). This shows how cross-over can lead towards better solutions. However, looking at the initial population as a whole, we can see that no combination of cross-overs will ever result in a candidate with a fitness higher than 6. This is because, among all 10 original candidates, there are only 6 positions in which we have the correct character.

This can be mitigated to some extent by increasing the size of the population. With 100 individuals in the initial population we would be much more likely to have the necessary building blocks for a perfect solution, but there is no guarantee. This is where mutation comes in.

Mutation is implemented by modifying each character in a string according to some small probability, say 0.02 or 0.05. This means that any single individual will be changed only slightly by mutation, or perhaps not at all.

By applying mutation to each of the offspring produced by cross-over, we will occasionally introduce correct characters in new positions. We will also occasionally remove correct characters but these bad mutations are unlikely to survive selection in the next generation, so this is not a big problem. Advantageous mutations will be propagated by cross-over and selection and will quickly spread throughout the population.

After repeating this process for dozens or perhaps even hundreds of generations we will eventually converge on our desired solution.

This is a convoluted process for finding a string that we already knew to start with. However, as we shall see later, the evolutionary approach generalises to deal with problems where we don’t know what the best solution is and therefore can’t encode that knowledge in our fitness function.

The important point demonstrated by this example is that we can arrive at a satisfactory solution without having to enumerate every possible candidate in the search space. Even for this trivial example, a brute force search would involve generating and checking approximately 5.6 quadrillion strings.

The Outline of an Evolutionary Algorithm

  1. Genesis – Create an initial set (population) of n candidate solutions. This may be done entirely randomly or the population may be seeded with some hand-picked candidates.
  2. Evaluation – Evaluate each member of the population using some fitness function.
  3. Survival of the Fittest – Select a number of members of the evaluated population, favouring those with higher fitness scores. These will be the parents of the next generation.
  4. Evolution – Generate a new population of offspring by randomly altering and/or combining elements of the parent candidates. The evolution is performed by one or more evolutionary operators. The most common operators are cross-over and mutation. Cross-over takes two parents, cuts them each into two or more pieces and recombines the pieces to create two new offspring. Mutation copies an individual but with small, random modifications (such as flipping a bit from zero to one).
  5. Iteration – Repeat steps 2-4 until a satisfactory solution is found or some other termination condition is met (such as the number of generations or elapsed time).

When are Evolutionary Algorithms Useful?

Evolutionary algorithms are typically used to provide good approximate solutions to problems that cannot be solved easily using other techniques. Many optimisation problems fall into this category. It may be too computationally-intensive to find an exact solution but sometimes a near-optimal solution is sufficient. In these situations evolutionary techniques can be effective. Due to their random nature, evolutionary algorithms are never guaranteed to find an optimal solution for any problem, but they will often find a good solution if one exists.

One example of this kind of optimisation problem is the challenge of timetabling. Schools and universities must arrange room and staff allocations to suit the needs of their curriculum. There are several constraints that must be satisfied. A member of staff can only be in one place at a time, they can only teach classes that are in their area of expertise, rooms cannot host lessons if they are already occupied, and classes must not clash with other classes taken by the same students. This is a combinatorial problem and known to be NP-Hard. It is not feasible to exhaustively search for the optimal timetable due to the huge amount of computation involved. Instead, heuristics must be used. Genetic algorithms have proven to be a successful way of generating satisfactory solutions to many scheduling problems.

Evolutionary algorithms can also be used to tackle problems that humans don’t really know how to solve. An EA, free of any human preconceptions or biases, can generate surprising solutions that are comparable to, or better than, the best human-generated efforts. It is merely necessary that we can recognise a good solution if it were presented to us, even if we don’t know how to create a good solution. In other words, we need to be able to formulate an effective fitness function.

NASA ESG evolved antenna.Engineers working for NASA know a lot about physics. They know exactly which characteristics make for a good communications antenna. But the process of designing an antenna so that it has the necessary properties is hard. Even though the engineers know what is required from the final antenna, they may not know how to design the antenna so that it satisfies those requirements.

NASA’s Evolvable Systems Group has used evolutionary algorithms to successfully evolve antennas for use on satellites. These evolved antennas (pictured) have irregular shapes with no obvious symmetry. It is unlikely that a human expert would have arrived at such an unconventional design. Despite this, when tested these antennas proved to be extremely well adapted to their purpose.

Other Examples of Evolutionary Computation in Action

Pre-requisites

There are two requirements that must be met before an evolutionary algorithm can be used for a particular problem. Firstly, we need a way to encode candidate solutions to the problem. The simplest encoding, and that used by many genetic algorithms, is a bit string. Each candidate is simply a sequence of zeros and ones. This encoding makes cross-over and mutation very straightforward, but that does not mean that you cannot use more complicated representations. In fact, most of the examples listed in the previous section used more sophisticated candidate representations. As long as we can devise a scheme for evolving the candidates, there really is no restriction on the types that we can use. Genetic programming (GP) is a good example of this. GP evolves computer programs represented as syntax trees.

The second requirement for applying evolutionary algorithms is that there must be a way of evaluating partial solutions to the problem – the fitness function. It is not sufficient to evaluate solutions as right or wrong, the fitness score needs to indicate how right or, if your glass is half empty, how wrong a candidate solution is. So a function that returns either 0 or 1 is useless. A function that returns a score on a scale of 1 – 100 is better. We need shades of grey, not just black and white, since this is how the algorithm guides the random evolution to find increasingly better solutions.

This is the first in a short series of articles on practical Evolutionary Computation.  The text is taken from the work-in-progress documentation for the Watchmaker Framework for Evolutionary Computation.  The next article will demonstrate how to implement evolutionary algorithms in Java using the Watchmaker Framework.

Further Reading

An Introduction to Genetic Algorithms Introduction to Evolutionary Computing The Blind Watchmaker

The Value of a Degree

Posted in Software Development by Dan on December 31st, 2008

Bill the Lizard (if that is his real name) wrote an interesting post revisiting the perennial debate of whether a formal Computer Science education is worthwhile for programmers or not. Bill makes several good points in the post and the comments. I’m paraphrasing here but he basically accuses self-taught programmers who dismiss a university education of arguing from a position of ignorance. If you haven’t experienced it for yourself, how do you know it wouldn’t have been beneficial?

“Education: that which reveals to the wise, and conceals from the stupid, the vast limits of their knowledge.” – Mark Twain

There are comments, both in the Reddit discussion that Bill references and in response to his own article, that suggest that a CS degree is actually an indicator of a poor programmer. As CS graduate myself, I cannot accept this hypothesis. I’ll accept that whether or not a person has a degree is not a reliable indicator of programming aptitude but I would be stunned if there was not at least some positive correlation between formal education and performance. There will always be exceptions that buck the trend. I’ve worked with some excellent developers who have not been to university and I’ve worked with people who have the degree but don’t know how to use it.

Self-learning is a vital skill for a programmer. Even if you’ve got your degree, you can’t stop learning there if you are going to continue to be useful. I do believe that it is possible to learn through self study anything that you could learn at university, but the problem with a home-brew education is that the teacher is as ignorant as the student. You tend to learn whatever you need to know to solve the current problem. It’s a piecemeal approach. Over time you’ll learn lots but the danger is that, because you don’t know what you don’t know, there may be blindspots in your knowledge. A good university course will be structured to teach what you need to know, even if it seems irrelevant at the time. It will provide a broader education, introducing concepts and programming paradigms that may not seem particularly useful but which all contribute to building a deeper understanding of the field.

The vital word in the preceding paragraph is the one emphasised: “good”. All of this debate ignores the crucial fact that degrees are not all equal. There are good degrees, bad degrees and many points in between. This fact is perhaps under-appreciated by those who have not been through the university process. If we could factor in the quality of the degree it would make for a more reliable indicator of developer aptitude.

Hiring Graduates

If you are responsible for hiring programmers you should familiarise yourself with which universities rate best for computer science education (this list may be quite different from the overall list of top universities). Something else to consider is the content of the course. The clue is normally in the name. If it’s not called just “Computer Science” or “Software Engineering” beware. It may be watered down by some other subject or it may be called something like “Management Information Systems”, which might suggest that more time was spent writing essays than writing code.

Q: What do you call somebody who graduates bottom of their class at medical school?

A: Doctor.

Perhaps the biggest mistake when considering the value of a degree as an indicator of programmer aptitude is treating it as a binary proposition: any degree = good, no degree = bad. This is simplistic and wrong. Getting a degree is easy as long as you can last the distance. Here in the UK, many universities will award a third class honours degree for an overall grade of 40%.  In fact, you can get a pass degree (no honours) with just 35%. Think about that for a while before calling them in for an interview. Over the course of 3 or 4 years, almost two thirds of everything that person did they got wrong and they have the certificate to prove it.

For senior developers degrees are mostly irrelevant since they will have copious real world experience on which they can be judged but being able to judge the worth of a degree is useful when hiring junior developers. All else being equal, somebody who graduated top of their Computer Science class at MIT will be a better bet than somebody who has a third class degree in HTML Programming from the South Staines Internet University.

Counting to 10 breaks the Web

Posted in JavaScript, Software Development, The Internet by Dan on December 19th, 2008

Opera Software (makers of arguably the finest web browser known to man) recently uncovered a latent Y2k-esque bug aflicting many major websites.  In pioneering the revolutionary concept of a two-digit version number, Opera 10 (currently in alpha) has shone a light on the sins of dozens of shoddy JavaScript hackers.

I can’t really understand how this happened.  Even less gifted “webmasters” should be able to grasp the concept of 10 without taking their shoes and socks off.

More thoughts on Stackoverflow.com

Posted in Software Development, The Internet by Dan on September 26th, 2008

Since my previous post on the subject, Stackoverflow.com has moved from private beta to public beta. I’ve had more time to use the site and have some more thoughts.  The criticisms here are meant to be constructive. Hopefully the feedback from users will help the Stackoverflow team to make a good site even better.

Performance

First the good news. The site has transitioned from private to public very well. Jeff and his team seem to have got it right in terms of architecture and infrastructure because, even with the increased load, it remains blindingly fast.

Front Page

In terms of usability, I think there’s more that could be done to help me find the content that I’m interested in. The default front page is, to be honest, not very useful. New questions are coming in so fast and on so many topics that displaying the most recent questions is just noise.

I would prefer to have a personalised home page that shows me relevant questions based on my previous answering/voting history.  I realise that this is major new functionality and I’m not criticising the Stackoverflow team for not having this in the initial version, it makes sense to get the site up and running first. However, it would be great if this could be implemented at some point. I’m not alone on this one, it’s the second most popular requested feature at the moment.

Presently I’m finding stuff that I want to look at by going to the tags page and clicking on interesting topics. But I’m sure I’m missing out on questions that would be of interest if only I could find them.

Tag Cloud

The tag cloud on the right of the front page isn’t very helpful either. It’s ordered with the most recent first. If I just wanted to view questions tagged “html”, I’m going to struggle to find the tag in the cloud. An alphabetical ordering would be more usable. Unfortunately, this has already been suggested and rejected.

Voting and Reputation

I outlined my concerns on the voting mechanism previously. In the interests of being constructive, rather than just a whiny blogger, I’ve opened new issues on the Stackoverflow Uservoice page. If you agree with me, please vote on these issues:

Addressing each of these will help in resolving The Fastest Gun in the West Problem (currently the number one voted-on issue). The problem is that early answers get the votes and later, better answers are largely ignored. Removing the penalty for down-voting will encourage more down votes where they are deserved (so an early answer that is later shown to be wrong is less likely to retain a high score). Also, if a down vote was as powerful as an up vote, people might be more careful in crafting good answers as opposed to quick answers.

Source Control and Backups – More than just a good idea

Posted in Software Development by Dan on September 25th, 2008

Are there really software development teams out there that don’t use any form of proper source control at all, even the bad kind?  I’d like to think that it wasn’t the case but I’m not so naive.

There’s a reason that “Do you use source control?” is the first question on the Joel Test.  It’s because it’s the most important.  If you answer “no” to this question you shouldn’t be allowed to answer subsequent questions.  Even if the rest of your process is perfect, you score zero.  You failed at software development.  I could say that if your team doesn’t use source control it is a disaster waiting to happen, but more likely the disaster already happened and you haven’t noticed yet.

Of course, you and I aren’t nearly dumb enough to try developing anything more complex than “Hello World” without version control in place.  I’m sure I’m preaching to the converted.  The kind of people who read obscure software development blogs probably already know a few things about effective software development.

But how good are your back-ups?

You do have a back-up, don’t you?

If you don’t have a back-up you are one accidental key-stroke or one hardware failure away from scoring zero on the Joel Test (under my rules)… and failing at software development.  Hardware will fail, people will screw-up, disgruntled former employees will set fire to the building.  None of these is a problem but a failure to anticipate and prepare is.

How often do you back-up?

There is only one right answer to this: every day.  Weekly back-ups are too costly.  Can you really afford to have your whole team redo an entire week’s work?  The first time you lose a week’s work you will switch to daily back-ups, so why not just do it now?

A melted back-up is no back-up at all

Off-Site Storage. You could physically take tapes to another location or you could upload files to a remote server.  Just don’t leave them here.

Does it actually work?

Honestly, have you ever tried restoring your source control back-up onto a different machine?  The most comprehensive back-up plan imaginable is useless if you can’t restore the back-ups.  If you haven’t seen it working (recently) then it doesn’t work.  There’s a good time and a bad time to find out that your back-ups don’t work.  15 minutes after your source control server spontaneously combusted is the bad time.

Are you still here?  You should be checking those back-up tapes…

UPDATE: The good people of Stackoverflow are discussing what could possibly be a good excuse for not using source control.

« Older Posts