More thoughts on Stackoverflow.com

Posted in Software Development, The Internet by Dan on September 26th, 2008

Since my previous post on the subject, Stackoverflow.com has moved from private beta to public beta. I’ve had more time to use the site and have some more thoughts.  The criticisms here are meant to be constructive. Hopefully the feedback from users will help the Stackoverflow team to make a good site even better.

Performance

First the good news. The site has transitioned from private to public very well. Jeff and his team seem to have got it right in terms of architecture and infrastructure because, even with the increased load, it remains blindingly fast.

Front Page

In terms of usability, I think there’s more that could be done to help me find the content that I’m interested in. The default front page is, to be honest, not very useful. New questions are coming in so fast and on so many topics that displaying the most recent questions is just noise.

I would prefer to have a personalised home page that shows me relevant questions based on my previous answering/voting history.  I realise that this is major new functionality and I’m not criticising the Stackoverflow team for not having this in the initial version, it makes sense to get the site up and running first. However, it would be great if this could be implemented at some point. I’m not alone on this one, it’s the second most popular requested feature at the moment.

Presently I’m finding stuff that I want to look at by going to the tags page and clicking on interesting topics. But I’m sure I’m missing out on questions that would be of interest if only I could find them.

Tag Cloud

The tag cloud on the right of the front page isn’t very helpful either. It’s ordered with the most recent first. If I just wanted to view questions tagged “html”, I’m going to struggle to find the tag in the cloud. An alphabetical ordering would be more usable. Unfortunately, this has already been suggested and rejected.

Voting and Reputation

I outlined my concerns on the voting mechanism previously. In the interests of being constructive, rather than just a whiny blogger, I’ve opened new issues on the Stackoverflow Uservoice page. If you agree with me, please vote on these issues:

Addressing each of these will help in resolving The Fastest Gun in the West Problem (currently the number one voted-on issue). The problem is that early answers get the votes and later, better answers are largely ignored. Removing the penalty for down-voting will encourage more down votes where they are deserved (so an early answer that is later shown to be wrong is less likely to retain a high score). Also, if a down vote was as powerful as an up vote, people might be more careful in crafting good answers as opposed to quick answers.

Source Control and Backups – More than just a good idea

Posted in Software Development by Dan on September 25th, 2008

Are there really software development teams out there that don’t use any form of proper source control at all, even the bad kind?  I’d like to think that it wasn’t the case but I’m not so naive.

There’s a reason that “Do you use source control?” is the first question on the Joel Test.  It’s because it’s the most important.  If you answer “no” to this question you shouldn’t be allowed to answer subsequent questions.  Even if the rest of your process is perfect, you score zero.  You failed at software development.  I could say that if your team doesn’t use source control it is a disaster waiting to happen, but more likely the disaster already happened and you haven’t noticed yet.

Of course, you and I aren’t nearly dumb enough to try developing anything more complex than “Hello World” without version control in place.  I’m sure I’m preaching to the converted.  The kind of people who read obscure software development blogs probably already know a few things about effective software development.

But how good are your back-ups?

You do have a back-up, don’t you?

If you don’t have a back-up you are one accidental key-stroke or one hardware failure away from scoring zero on the Joel Test (under my rules)… and failing at software development.  Hardware will fail, people will screw-up, disgruntled former employees will set fire to the building.  None of these is a problem but a failure to anticipate and prepare is.

How often do you back-up?

There is only one right answer to this: every day.  Weekly back-ups are too costly.  Can you really afford to have your whole team redo an entire week’s work?  The first time you lose a week’s work you will switch to daily back-ups, so why not just do it now?

A melted back-up is no back-up at all

Off-Site Storage. You could physically take tapes to another location or you could upload files to a remote server.  Just don’t leave them here.

Does it actually work?

Honestly, have you ever tried restoring your source control back-up onto a different machine?  The most comprehensive back-up plan imaginable is useless if you can’t restore the back-ups.  If you haven’t seen it working (recently) then it doesn’t work.  There’s a good time and a bad time to find out that your back-ups don’t work.  15 minutes after your source control server spontaneously combusted is the bad time.

Are you still here?  You should be checking those back-up tapes…

UPDATE: The good people of Stackoverflow are discussing what could possibly be a good excuse for not using source control.

Stackoverflow.com – First Impressions

Posted in Software Development, The Internet by Dan on September 12th, 2008

Over the last few days I’ve been playing with the beta of Stack Overflow. In case you are unaware, Stack Overflow is a joint venture between Jeff Atwood of Coding Horror and Joel Spolsky of Joel on Software fame. It’s basically a question and answers site for software developers. A mixture of Experts Exchange, Proggit and Wikipedia. The site is scheduled to come out of beta on Monday when it will open its doors to everyone.

From initial impressions I think it’s fair to say that the site will be a success, initially at least. Being A-list bloggers (and now podcasters too), Jeff and Joel have been able to generate a lot of exposure for their project.

Like Jeff’s blog, the minimalist site design is clean and bold and so far the whole system is very responsive (we’ll see if that’s still the case when the traffic spikes on Monday). The beta audience are already posting thousands of questions, almost all of which generate extremely prompt answers (of varying quality).

However, I think the site suffers a little from the ambitions of trying to be too many different things; is it a programming forum, or is it a Wiki?. There are a lot of different ideas in the implementation that interact via a quite complicated set of rules that have evolved over the course of the beta.

Reputation & Badges

Stack Overflow has two mechanisms for measuring a user’s standing within the community. Firstly, each user has a reputation score. This starts at 1 and increases as you make positive contributions (posting questions and answers that get voted up).  As you reach various milestones you get more privileges within the community, such as being able to vote on answers or tag other people’s questions.

Your reputation can be diminished if you get voted down or reported for abuse, but it can’t go below 1 and on the whole it’s heavily biased in favour of upward movement.

The second incentive for users to contribute is the ability to collect “badges”. This works exactly like the Cub Scouts. Some badges are easy to achieve (just fill in your profile or post your first question), and others are much harder to obtain (get 100 up votes for one of your answers).

Voting

Voting is one area of the site that I think could do with an overhaul. It’s unbalanced and not transparent enough. If your answer gets voted up you gain 10 points of reputation. But if your answer gets voted down you only lose 2 points. So if you post something that sounds plausible to the uninformed masses but is actually wrong, you could get 5 up votes and 6 down votes for a net score of -1 yet still gain a 38-point reputation boost. An up vote should have equal weight to a down vote, just like on DZone or Slashdot. It also might be better to show both the number of up votes and the number of down votes (as on DZone) rather than just the net total. This would make it easier to identify controversial content (something with 10 down votes and 12 up votes is not quite the same as something with no down votes and 2 up votes).

Another problem with the voting is that down votes penalise the voter as well as the user whose answer is being voted on. So if you post something wrong like “Java passes objects by reference”, I can either ignore it or lose 1 point of reputation for giving you the down vote that you deserve (even then it will take five of us to fully cancel out the one up vote that you got from someone who didn’t know better).

When I queried the justification for penalising down-voters I was told that it was to combat attempts to game the system. Apparently, earlier in the beta, users were posting answers to questions and then voting down everybody else’s answers so that their answer would appear at the top. The idea was that by making users pay to vote down this behaviour would be discouraged. A better solution to this problem would have been to remove the conflict of interest by not allowing users to answer and vote on the same question (which is how Slashdot’s mod points work), rather than punishing all down votes across the whole site.

The net effect of this voting system is that everybody’s reputation increases pretty quickly. Beyond the minimum score required to get full privileges the numbers can become meaningless. To avoid having to rename the site Integer Overflow there are a couple of artifical limits that restrict the number of votes you can cast and the number of reputation points you can earn each day.

Other Thoughts

Aside from my reservations about the voting, my impressions of Stackoverflow are mostly positive. The fact that it has already attracted hundreds of enthusiastic participants suggests that it has genuinely found a niche. However, I do feel that it is probably more elaborate than it needs to be (I don’t really get the need for the Wiki functionality).

Further Reading

Denton Gentry and Sara Chipps have also written about their impressions of Stack Overflow. Or, on a less positive note, you could try Crap Overflow.

Avoid NIO, Get Better Throughput

Posted in Java by Dan on September 3rd, 2008

The Java NIO (new/non-blocking I/O) API introduced in Java 1.4 is arguably the most arcane part of the standard library. With channels, selectors, byte buffers and all the associated flipping, marking, compacting, event-handling and registering/de-registering of read/write interest, it’s an entirely different level of complexity to the old-fashioned, straightforward blocking I/O. And if you want to use SSL with NIO then it’s a whole new world of pain.

Few have mastered NIO. For most it provides an opportunity to really get to know your debugger. “Should this buffer be flipped before I pass it to this method, or should the method flip it?”. BufferOverflowExceptions and memory leaks abound.

So, in the spirit of doing the simplest thing that could possibly work, writing your own NIO code is usually best avoided unless you have a compelling reason. Fortunately, some masochistic individuals have done a lot of the hard work so that we don’t have to. Projects such as Grizzly and QuickServer provide proven, reusable non-blocking server components.

However, in most instances, maybe non-blocking I/O is not necessary at all? In fact, maybe it is detrimental to performance?

That’s the point that Paul Tyma makes. He attacks some of the received wisdom about the relative merits of blocking and non-blocking servers in Java. The characteristics of JVMs and threading libraries change as new advances are made. Good advice often becomes bad advice over time, demonstrating the importance of making your own measurements rather than falling back on superstitions.

Paul’s experiments show that higher throughput is achieved with blocking I/O, that thread-per-socket designs actually scale well, and that the costs of context-switching and synchronisation aren’t always significant. Paul’s slides from his talk “Thousands of Threads and Blocking I/O: The Old Way to Write Java Servers Is New Again (and Way Better)” are well worth a look.

If you are writing your own multi-threaded servers in Java, Esmond Pitt’s Fundamental Java Networking and Java Concurrency in Practice by Brian Goetz et al. are essential reading.

Real World Haskell

Posted in Books, Haskell by Dan on September 2nd, 2008

The book Real World Haskell by Bryan O’Sullivan, Don Stewart, and John Goerzen, will be available to buy from November.

The content is also freely available online already and is well worth a look if, like me, you are keen to learn more about developing actual useful programs with Haskell.

I first mentioned Real World Haskell last year. At the time I also highlighted GHC’s LGPL problems as an obstacle that could potentially discourage the wider adoption of Haskell. It seems that some progress is being made on that front. At present though, GHC will still statically link to GMP, which means that developers who distribute GHC-compiled binaries are distributing “derivative works” as defined by the LGPL.