New Adventures in Software » Software Development

Introductory REST articles

Posted in Software Development by Dan on July 8th, 2008

I found Stefan Tilkov’s REST Anti-Patterns post via DZone. It’s the third article in a series of posts on the topic of RESTful applications. Together they serve as a useful introduction.

Teach Yourself with University CS Resources

Posted in Software Development, The Internet by Dan on June 23rd, 2008

Over at DZone, I saw an article titled “Who Needs a Computer Science Degree When There’s Wikipedia?“. It suggests that you can learn as much from Wikipedia as you can by pursuing a formal university education in Computer Science. Sure, Wikipedia can be extremely informative (at least as an initial resource), but a random walk through the Wikipedia jungle could take you anywhere. It’s not a very structured syllabus.

I’ve been through a university CS education. I’m not going to argue the pros and cons of it here. Instead I’m more interested in how to acquire similar knowledge freely via the web. I’m certain that there are better approaches than trawling through Wikipedia (though Wikipedia would remain invaluable for background reading and finding references to more authoritative sources).

For me, the most obvious place to start is the universities themselves. Have a look at the Computer Science department websites and you will find that many of them provide access to course materials for anyone to download. One of the perils of teaching yourself is that you often don’t know what you don’t know. Unlike Wikipedia, the university content will be from a structured course, designed to teach the important stuff and avoid leaving huge blindspots in your knowledge.

Unlike going to university for real, you don’t have to worry about fees, academic records or geography. You get to pick from the best universities worldwide to provide your education. Leading the way is MIT and their Open Courseware program. This provides high quality content tailored for remote learning. But there are many other universities that provide access to lecture notes (or videos) and exercises.

I was thinking how useful it would be if there was a website that collated links to the best online CS course materials. Then, quite by accident, I stumbled across Google’s CS Curriculum Search. This is a Google web search restricted to the CS departments of universities. It categorises the results into “Lectures”, “Assignments” and “Reference”. It seems to be a very useful tool.

The Curriculum Search is part of the Google Code University, which includes their own content related to CS topics that are important to them (e.g. distributed computing and web security).

Another resource that may prove useful is Scholarpedia, which I have mentioned before.

Great Innovations in Computing

Posted in Software Development by Dan on June 8th, 2008

Jeff Atwood poses the question “What do you think the single greatest invention in computer science is [other than the computer itself]?”

Of course, all of this depends on how strictly you define Computer Science and whether or not you include innovations that could more accurately be classified as Software Engineering or Electronic Engineering achievements.

“Computer science is no more about computers than astronomy is about telescopes.” – Edsger Dijkstra

Jeff himself, quoting Steve McConnell, opts for the “routine” (as in procedure/method/function) as the greatest invention. The comments on Coding Horror suggest alternatives, with the compiler being perhaps the most compelling contender.

My vote goes elsewhere. A strong case could be made for the Internet (in general, or the World Wide Web specifically) as the greatest invention. But, like Jeff, I’m going for something much more fundamental. The question is framed to exclude the computer itself, but I would identify one particular refinement to the design of the modern computer as being arguably the most significant milestone in Computer Science. Or perhaps more accurately, it is an advance in the design of the machines that allow practical application of a science that Dijkstra suggested did not really need machines at all.

In February 1946, The University of Pennsylvania unveiled the ENIAC, the first general-purpose, Turing-complete electronic computer. It was programmed by manipulating switches and cables. By hand. The ENIAC project had been in progress since 1943 and, by the time of its unveiling, work was already underway on its successor, the EDVAC. In June 1945, John von Neumann distributed a document entitled First Draft of a Report on the EDVAC. This 10-page document, written before the end of the Second World War, contained the outline of pretty much every general-purpose computer built since. The design of the EDVAC differed from the ENIAC in two fundamental ways. Firstly, it used binary numbers rather than decimal. Perhaps even more significantly, it was to be a stored-program computer.

By defining an instruction set and treating computations as sequences of these instructions, stored in the machine’s memory just like data, a stored-program computer is considerably more flexible than previous hard-wired machines.

Like all the best ideas it seems fairly obvious in hindsight, but the implications of this innovation are difficult to overstate. As well as the powerful flexibility it offered, the advent of the stored-program computer meant that it was now possible to write programs that wrote programs. This was an advance that would eventually lead to that other great computing invention: the compiler. Code was data.

The stored-program approach came to be known as the von Neumann architecture, though this ignores the contributions of von Neumann’s collaborators and contemporaries. Many of the ideas were reportedly established before von Neumann became involved in the project. Nor was the EDVAC the first working stored-program computer. That honour goes to the University of Manchester’s Small-Scale Experimental Machine (SSEM or “Baby”), which became operational in June 1948.

Visual SourceSafe: A Public Service Announcement

Posted in Software Development by Dan on May 23rd, 2008

Microsoft’s Visual SourceSafe has never been a good option for revision control. Even before the emergence of the likes of Subversion, Mercurial and Git, there have always been better free solutions available.

The fact that people still use VSS in 2008 scares me.

I’m not particularly anti-Microsoft. If you choose to use Windows, or Office, or even IE – fine. But a decision to use Visual SourceSafe is not one that can be rationally defended. Too often it gets picked because it comes bundled with a Microsoft development subscription that has to be paid anyway:

“Well, we’ve paid for it, so we might as well use it.”

“After all, something we paid for is going to be better than something free, right?”

No. No. No. DO NOT USE VISUAL SOURCESAFE FOR ANYTHING. Ever.

How Bad Can It Be?

Visual SourceSafe and I go back a long way. In my first job out of university, we used VSS. I had not been exposed to version control systems before. They didn’t teach source code management at my university in those days despite it being arguably the most important tool for professional software development.

I got on fine with VSS. It seemed to do the job. Sure it was slow but we could wait. And it only worked on Windows but we were all running NT 4.0 anyway. And sometimes the repository would get corrupted but we had backups. And we couldn’t easily check out two working copies of the same project on the same machine but we could work around this. And the exclusive checkout model was a bit restrictive but it seemed to make sense and we could just take turns editing the crucial files.

Then somebody insisted that all future projects would use CVS. This was bad – or so I thought. VSS had a reasonably friendly GUI, albeit with a few odd quirks. CVS had the intimidating CLI or some really awkward GUIs (it was a while before I would discover the excellent SmartCVS). All that merging and branching was very complicated too. Surely that wasn’t a good idea? But after a period of complaining I came to respect CVS even though it would sometimes tell me “I HATE YOU“. It was clearly a product of the Sticky-Tape and String school of software development but it worked pretty well.

Visual SourceSafe was less sticky-tape and string and more sticky-tape and a severe blow to the head.

“Visual SourceSafe? It would be safer to print out all your code, run it through a shredder, and set it on fire.” – (Attributed to an unidentified Microsoft employee).

Masquerading as a serious professional tool, VSS exhibits a staggeringly inappropriate architecture. There is no server process for VSS. Everything is based on Windows file-sharing. This obviously has serious implications for security. Client software is trusted implicitly. It also explains the poor performance and susceptibility to repository corruption. Everything is stored on the network share, even the dimensions for each user’s VSS Explorer window (there is nothing to stop you from editing your colleague’s local preferences in the shared .INI file). Maybe if you have a fast, reliable network, daily backups, and you’re very trusting, you can use VSS without major problems. Maybe.

It Gets Worse

That of course assumes that you don’t need to access your source code remotely. Unfortunately, my pre-CVS days were not the last time that I encountered SourceSafe.

I later joined another company who hadn’t got the memo. At least this time we had a solution for remote access. I say “solution”, it was more a proof-of-concept. By “proof-of-concept” I mean that somebody saw it working once. Probably.

Clearly we couldn’t just expose the network share on the Internet, so the idea was to establish a VPN connection to the office and then use the VSS client normally. Even with a broadband connection this was intolerable. The inefficiencies of SourceSafe were not always apparent on a 100Mb LAN but they were all too obvious over ADSL. Since people were usually not away from the office for more than a couple of days at a time, it was easier just to plan your work in advance and check in your changes when you got back (just make sure nobody else is going to need to edit the same files while you are away). To add insult to injury, we were increasingly developing for Solaris. We couldn’t access SourceSafe from the Sparc machines but at least we had scp to bridge the gap.

Anyway, to cut a long story short, we eventually ditched VSS in favour of Subversion but not before I discovered my two favourite “features”. Perhaps the most entertaining is the timezone problem. If you have clients in different timezones – or even on the same LAN but with clocks out-of-sync – when one client checks in a change from “the future”, the other can’t see it until its clock has caught up.

The other problem is that, once you have deleted a file, you cannot then re-add another file with the same name in the same location without purging the history of the first file. This situation will happen at least once in the life of a project (after all, developers sometimes change their minds). Once you have purged the history of the original you then cannot retrieve a complete snapshot of the project for any version that included that file. That point is worth emphasising:

VISUAL SOURCESAFE DOES NOT MAINTAIN A HISTORY OF ALL COPIES OF ALL FILES UNDER ALL CIRCUMSTANCES.

And really, if that’s the case, why bother with version control at all? Even ignoring the several other problems SourceSafe is not fit for purpose.

Use Subversion, use Git, buy Perforce, buy BitKeeper, even use CVS if you must (though Subversion should be considered first). Just don’t use Visual SourceSafe.

NOTE: In the interests of accuracy, it is worth mentioning that my experience is primarily with version 6.0 and earlier of SourceSafe. Visual SourceSafe 2005 introduces a server process as a helper to address some of the issues, though it is really just papering over the cracks as the underlying architecture remains. It is also worth noting that there are 3rd party add-ons to SourceSafe to improve the remote access situation. But why pay for a patch for a defective version control system when you can get one that works for free?

Why are you still not using Hudson?

Posted in Java, Software Development by Dan on May 9th, 2008

This week Hudson was awarded the Duke’s Choice Award in the Developer Solutions category at JavaOne.

In the space of a couple of years, Hudson has come from nowhere to become the leading contender among Continuous Integration servers. It’s head and shoulders above the other free alternatives, and arguably at least as good as the commercial offerings.

The venerable Cruise Control led the way for many years, but it suffers for having been first. Configuring builds is a lot more work than it needs to be. Continuum improved on this by allowing everything to be controlled via the web interface. Continuum is simple and useful. For a while I used it and I was happy.

Then JetBrains gave away free TeamCity licences with IntelliJ IDEA 6.0 and opened my eyes to a world beyond the fairly basic functionality of Continuum. I was impressed (pre-tested commits are a neat feature), but because you needed licences for every user, I was never able to fully commit to it.

NOTE: JetBrains have since introduced a free professional edition of TeamCity.

Anyway, at some point last year I took a serious look at Hudson. I’d been vaguely aware of it for a little while but never been compelled to try it out.

Hudson is impressive. It is ridiculously easy to install. It has the same ease-of-configuration that makes Continuum so simple, but it combines it with high-end features, such as distributed builds, that are usually only found in commercial offerings like TeamCity and Atlassian’s Bamboo.

Hudson is primarily the work of Sun Microsystem’s Kohsuke Kawaguchi. Kohsuke is a prolific Open Source developer. With Hudson he has designed a system with well thought-out extension points that have enabled an army of plug-in writers to deliver a bewildering array of capabilities.

Out-of-the box Hudson supports CVS and Subversion repositories. Plug-ins extend this list to include Git, Mercurial, Perforce, ClearCase, BitKeeper, StarTeam, Accurev and Visual SourceSafe. Hudson also supports pretty much any type of build script (including Ant, Maven, shell scripts, Windows batch files, Ruby, Groovy and MSBuild).

In addition to e-mail and RSS, Hudson can also notify you of build events via IRC and Jabber as well as via a system tray/dock applet. Of course, all of these were too mundane for Kohsuke, so he built his own glowing orb.

But Hudson is much more than just a Continuous Integration server. It’s a complete build management and tracking solution. As well as publishing Javadocs, archiving build artifacts, and monitoring and graphing JUnit/TestNG results over time, you can also track and plot code coverage (Cobertura, EMMA and Clover are all supported) and coding violations (via FindBugs, Checkstyle and/or PMD). And if all that’s not enough, you can play the Continuous Integration game.

So why are you still not using Hudson?

Eat, Sleep and Drink Software Development: Finding The Zone

Posted in Software Development by Dan on May 5th, 2008

Tired Programmers Damage Your Project

I completely agree with David Heinemeier Hansson’s recent article, Sleep Deprivation is not a Badge of Honor.

I’ve seen many examples of this kind of counter-productive attitude in software development. From developers on an hourly rate contributing 100-hour-plus work weeks to maximise their pay, to teams being expected to work evenings and weekends just to be seen to be doing something to rescue late projects, even though that “something” is ultimately detrimental both to the project and to the team.

The accepted wisdom seems to be that if we can do X amount of work in 40 hours then in 80 hours we ought to be able to do, if not 2 * X, then at least more than X. This is based on the dubious assumption that some extra work is always better than no extra work. We may expend more effort but it’s quite likely that the effort will be wasted introducing bugs, making poor design decisions and doing other things that will invariably cause more work later on.

The ever-infallible Wikipedia lists confusion, loss of concentration, impatience, memory lapses, depression and psychosis among the many effects of sleep-deprivation. These don’t sound like ideal traits for a software developer.

Speaking from personal experience, the impact of tiredness on my concentration is the real productivity killer. If you put in extra hours at the beginning of the week you may find it impossible to repay the debt until the weekend, which means you’ll be working at a sub-optimal level for days.

At a company were I worked several years ago we would routinely work late into the evening to try to get more done. Except there was one developer who was extremely reluctant to put in additional hours and generally managed to avoid doing so. This was tolerated because he was consistently the most productive member of the team and produced the best quality code. It didn’t occur to me at the time but the reason he produced the best work was probably largely because he wasn’t working stupid hours, unlike the burnt-out hackers who were checking in poorly thought-out, error-strewn code every night.

It is vital to be able to disengage form the task at hand. To go away and come back with a fresh perspective and new insights. You can’t see the big picture with your nose constantly pushed up against the canvas.

An Industry of Addicts

Tiredness often leads to programmers relying on stimulants, usually caffeine, as a substitute for adequate rest. It would appear that developers who don’t have a caffeine dependency are in the minority. I’ve had colleagues who need two cans of Red Bull just to kick-start their brains in the morning and others who drink several cups of ridiculously strong black coffee to keep them going through the day. Of course, here in Blighty, the delivery method of choice is the humble cup of tea – backbone of the British Empire.

High caffeine consumption has a whole host of nasty side-effects that complement the effects of sleep-deprivation perfectly. Insomnia is the real kicker. You need to sleep, you are knackered, but you can’t sleep because of the caffeine. So you either go to bed later or you lie awake for hours. When the alarm goes in the morning you are not recharged. You drag yourself out of bed and drink that first coffee/tea/Red Bull to get you started for the day. You are an addict, just in a slightly more socially-acceptable way than if you were smoking crack in the back alley.

“The Zone” and Peak Mental Performance

All developers are familiar with “The Zone”: that elusive state of mind where the code flows and in an hour we achieve more than we could in a week outside of the zone. What is less clear is how do we get into the zone in the first place? Lack of sleep doesn’t help. If you are tired you won’t find the zone. Too much caffeine probably has a similar effect.

So what else affects our mental performance? I am ignorant in these matters but it seems reasonable that diet and general well-being would play a part. This makes Google’s strategy of free meals and snacks particularly interesting. Not only are they providing a perk that may encourage people to work there, nor are they merely encouraging their employees to lunch together to encourage team-building and sharing of ideas. They are also taking control of their workers’ nutrition, rather than leaving them to subsist on Mars Bars and Coke. It would be fascinating to see a study into what kind of impact this had on staff performance and whether it came close to offsetting the apparently huge cost. The best athletes leave nothing to chance in their preparation. Nutrition is part of this. Maybe it’s the same for more intellectual endeavours?

Getters, Setters and the Great Coverage Conspiracy

Posted in Java, Software Development by Dan on April 1st, 2008

A frequent topic of Java-related blogs is whether it is worthwhile to write unit tests for simple getters and setters. This posting that I came across today proposes a reflection-based trick for eliminating much of the work in writing these tests. Maybe this is an improvement over other approaches, but what bothers me most is the motivation for wanting to test getters and setters in the first place.

It seems that many of those advocating unit testing simple getters and setters are driven by a desire to improve their coverage scores with the actual utility of the tests a secondary concern.

Firstly, I should state that I am absolutely in favour of measuring coverage for test suites. In fact, I think it’s pretty much essential. If you are writing automated tests but not measuring code coverage then you are just scratching around in the dark. What’s great about coverage reports, particularly those that show branch coverage as well as line coverage, is that you get to see exactly where your tests are neglecting certain scenarios. Coverage reports can also be useful in highlighting code that is not used and can be removed.

The problem with code coverage is that it only shows where your tests are weak. It does not prove that your tests are good, even if the coverage is 100%. So writing tests with the sole aim of improving the coverage score is merely an exercise in self-deception. It’s the tail wagging the dog.

If you need to add tests for all your getters and setters in order to achieve x% code coverage, where x is some mandated target, there are two questions you need to ask:

Do you have too many getters and setters?
Are you avoiding testing difficult code?

I could go on for pages about the first point. There are far too many getters and setters in most Java code. Too many developers think encapsulation is simply a case of making fields private and providing access to them with getters and setters. It would be better to aim for making fields private and not providing access to them with getters and setters. Favouring constructor-based dependency injection over setter-based DI is something else to consider (although that’s a whole other article in the making…).

How do you know if you have too many getters and setters? Well your coverage reports are a good starting point. If the getters and setters are essential to your application, it will be just about impossible to avoid exercising them indirectly from other tests. If you have good coverage elsewhere but the getters and setters aren’t touched, chances are they aren’t needed. Adding more tests is not the only way of improving your test coverage. Another way is to remove code so that you have less to test.

The second question above is also important. If you require your team to achieve a rigid 75% test coverage target then you are almost guaranteeing that you will get tests for the 75% of the application that is easiest to test. Writing tests for getters and setters helps to fulfil the 75% requirement without needing to think about how to test the difficult bits of the system. Unfortunately, the other 25% is probably the code that really needs testing/refactoring.

For me it’s pretty clear. Don’t write unit tests for getters and setters. Better still, don’t write getters and setters (except where necessary). And don’t confuse test-driven development with coverage-driven development.

Maven Revisited: Fallacies and usefulness

Posted in Java, Software Development by Dan on February 3rd, 2008

The eternal love/hate debate around Maven has resurfaced again recently, triggered by Howard Lewis Ship’s denouncement of Maven following his experiences on the Tapestry project (InfoQ has a summary of the recent discussions). I’ve touched on this previously with a brief echoing of Charles Miller’s sentiments.

Meanwhile, Don Brown recently announced his one-man crusade to inject some much-needed common sense into how Maven works (just addressing the first 4 items on his list would make things much better). Don followed this up by applying yet more common sense, this time to how Maven works with plugins.

All of this activity and debate prompted me to reconsider exactly what it is about Maven that makes me avoid it wherever possible. It’s certainly not the convention over configuration. Maven’s conventions might not be to everyone’s taste, but the reasoning is sound (I employ a similar convention-over-configuration approach to Ant builds, using macros). The lack of documentation is definitely a contributing factor, and it’s certainly buggier than Ant. But in the end it all comes down to the dependency management and Charles Miller’s inescapable conclusion that it is “broken by design”.

Maven’s fondness for project dependencies scattered across continents fails to take into account many of the 8 Fallacies of Distributed Computing. Even ignoring the trust issues stemming from the outsourcing of your dependency-management, the more dependencies you have, the less likely you are to be able to access all of them at any given point in time. There will be failures beyond your control at some point.

Assuming that all of the repositories are available, you’ve then got to hope that somebody doesn’t upgrade a library that you depend on and introduce a bug into your application. If you haven’t taken the precautions to specify an exact version number, you could be in trouble. And if you have taken the precautions, why persist with this fragile nonsense? Why not just put the JAR files in the version control repository and remove most of the potential failure points? The same question occurred to Don Brown and he’s doing something about it (it’s point number 4 on his list).

So far, so negative. Are there any redeeming features of Maven? Well there is one use case where the dependency management actually makes a fair bit of sense: internal dependencies. By internal dependencies, I mean dependencies within an organisation. If your development team has multiple projects and there are dependencies between them, the Maven approach could be the solution. In this scenario, everything is within your control. The software is written by your developers and hosted on your servers that are managed by your admins.

This dynamic approach to dependency management is more flexible than copying and pasting different versions of source and/or binaries between projects. And if you limit it to internal projects, you can eliminate most of the downsides. Of course, you don’t have to use Maven for this. You may want to look at Ivy instead.

The Perils of Web Development, the Importance of Testing and Why 95% of the World Couldn’t See My Page

Posted in Software Development by Dan on December 17th, 2007

Write the web page. Check how it looks in browser of choice. Validate the XHTML. Validate the CSS. Double-check in alternative browser. Job done. Surely the worst that can happen in other browsers is that some things are a few pixels out of alignment?

Not quite. Because, despite being well-formed and perfectly valid (it even has a nice W3C badge on the bottom), most of the content was completely invisible in the two most popular web browsers.

The problem is Internet Explorer’s handling of the <script> tag. IE will not let you use a “self-closed” script tag (e.g. <script src=”myscript.js” />). It’s well-formed XML, and perfectly valid XHTML, just like the other self-closed tags IE allows, but IE assumes it is unclosed and therefore ignores the remainder of the content (at least until it hit a </script> tag belonging to another script further down the page). Brilliant. I’ve no idea what the justification for this is and, to be honest, I don’t care. I’d like to curse the ineptitude of Microsoft, change the page and that would be that. But no, I can’t just ignore it because it’s a bloody conspiracy. Firefox behaves in exactly the same way. So the lesson to take from this is: Opera and Safari – sensible; IE and Firefox – insane.

To be honest, I was already aware of this bug, in IE at least, because I had come across it once before (this makes it doubly frustrating). On that occasion I was editing the XHTML with IntelliJ IDEA, which very helpfully highlighted the problematic tag with a warning that it would not work with IE. For this page I used Vim, so no such help.

What is most annoying, and embarrassing, is that this page has been like this for ages (I can’t remember how long), displaying nothing but the Feedburner view of this blog, and a big white space, to most visitors (and sometimes I get literally several hits a week). I admit that the rest of the content is very minimal and not particularly interesting, but I’d prefer it if it was the visitors that chose to ignore it rather than their browsers.

Yes, yes… I know I was asking for trouble by neglecting to test in either of the most popular browsers. I have learned my lesson (until next time at least). Assumption truly is the mother of all fuck-ups.

Book Review: Programming Collective Intelligence

Posted in Evolutionary Computation, Python, Software Development by Dan on December 13th, 2007

It’s called “Programming Collective Intelligence” and is presented as a book for building “Smart Web 2.0 Applications” but it is essentially an extremely accessible explanation of a wide array of machine learning and data-mining algorithms. How do sites like Amazon and Last.FM make recommendations? How do search engines work? How does Google News manage to categorise and present the most important news articles without human intervention? How do you build a useful spam filter?

All of these questions are answered and compelling example applications are built step-by-step to demonstrate the power of the ideas presented here. Decision trees, genetic algorithms, neural networks, support vector machines, genetic programming, Bayesian classifiers and non-negative matrix factorisation are some of the techniques covered and all without the dry, maths-heavy text that normally fills books on these topics.

The examples throughout are exclusively in Python, which may have put me off had I realised this when I ordered it. I have nothing against Python except for my complete lack of experience with it. However, the examples are easy enough to understand for anybody familiar with other high-level languages. As result of reading the book, I may actually try my hand at a bit of Python hacking now.

How well do these techniques work? Well I’d never have found out about this book but for Amazon’s automated recommendations system. I’d thoroughly recommend this book to anyone looking to learn about interesting AI techniques without wading through opaque academic papers.

(If you find the genetic algorithms and genetic programming topics interesting, check out the Watchmaker Framework for Evolutionary Computation and some of the books recommended there.)

« Older Posts

New Adventures in Software by Dan Dyer