Carmack on Static Analysis and Functional Purity

Posted in Haskell, Software Development by Dan on October 16th, 2012

In the absence of anything worthwhile of my own to write here at the moment, I thought I’d instead highlight a couple of interesting blog posts that I’ve discovered recently by id Software’s John Carmack.

I read Carmack’s thoughts on static analysis a few weeks ago and his reports on the effectiveness of tools for analysis of C++ code chimed with my own experiences with Java tools such as FindBugs and IntelliJ IDEA.

The first step is fully admitting that the code you write is riddled with errors. That is a bitter pill to swallow for a lot of people, but without it, most suggestions for change will be viewed with irritation or outright hostility. You have to want criticism of your code.

Automation is necessary. It is common to take a sort of smug satisfaction in reports of colossal failures of automatic systems, but for every failure of automation, the failures of humans are legion. Exhortations to “write better code” plans for more code reviews, pair programming, and so on just don’t cut it, especially in an environment with dozens of programmers under a lot of time pressure. The value in catching even the small subset of errors that are tractable to static analysis every single time is huge.

In the other article, which I only read today, Carmack espouses the virtues of pure (i.e. side-effect-free) functions. His is a pragmatic approach concerned with how to exploit purity in mainstream languages, where it is entirely optional, as opposed to advocating jumping ship to Haskell. Even when 100% purity is impractical there are still benefits in minimising impurity.

A large fraction of the flaws in software development are due to programmers not fully understanding all the possible states their code may execute in. In a multithreaded environment, the lack of understanding and the resulting problems are greatly amplified, almost to the point of panic if you are paying attention. Programming in a functional style makes the state presented to your code explicit, which makes it much easier to reason about, and, in a completely pure system, makes thread race conditions impossible.

My experiences with Haskell have informed how I approach coding in other languages. In certain cases I’ve ended up taking the functional approach to such extremes that I’ve been left questioning my original choice of implementation language.

No matter what language you work in, programming in a functional style provides benefits. You should do it whenever it is convenient, and you should think hard about the decision when it isn’t convenient.

Free StackOverflow Careers Invites

Posted in Software Development, The Internet by Dan on June 30th, 2011

About 18 months ago when I was looking for contract work, I signed up for Stack Overflow Careers. This was back when you had to pay to use the service (it’s now free but registration is by invitation only – more on that in a minute). The asking price had quickly dropped from an ambitious $99 a year to a merely illegal $9 by the time I signed up.

The Careers site has evolved considerably since it launched and now allows you to create quite a sophisticated online CV that aggregates your Stack Exchange activity, open source projects (GitHub, Google Code, etc.) and more. The idea of the service is that employers can search for candidates who appear to be good matches for their vacancies. All the enquiries I had were from companies developing financial software in London, which would have been ideal except for the work and the location.

Anyway, it seems that Stack Overflow is desperate for more people to sign up for Careers. When it went free-but-invite-only I was awarded 5 invites to distribute to suitable individuals. Last week I was given 20 more invites. Today I have been given two further batches of 20 invites. I currently have 63 invites available. If you want to try out the service for yourself, all you have to do is follow this link to claim one of my invites.

ReportNG 1.1.1 – The Less Embarrassingly Bad Version

Posted in Software Development by Dan on May 21st, 2010

It seems that ReportNG 1.1 fared badly when it came into contact with the real world. It was a buggy piece of crap. If you upgraded and suffered IllegalStateExceptions or NullPointerExceptions, I’m sorry for wasting your time.

The new chronology page was the root of all the problems. The main one (symptom: IllegalStateException) was triggered when you used TestNG’s @AfterXXX annotations. My tests included only @BeforeXXX annotations so I didn’t detect the issue. I have improved the tests and fixed the cause.

Having fixed the stability issues I am left with a chronology page that has a couple of problems with the accuracy of the information it displays. These are due to invalid assumptions on my part about what the TestNG API would return (assumptions I should have tested more thoroughly). There may well be other ways to get TestNG to provide the information required to produce a worthwhile chronology but for now I have disabled it in version 1.1.1.  Compared to 1.0 this version offers i18n and a fix for problems with Gradle. We’ll just forget that 1.1 ever happened.

ReportNG has moved to GitHub. The project website is at http://reportng.uncommons.org. The SVN repository and issue tracker at Java.net are no longer in use.

Moving Projects from Java.net to GitHub

Posted in Java, Software Development by Dan on May 17th, 2010

How to move your project from Subversion on Java.net to Git on GitHub without losing the change history.

Cloning the Subversion Repository

The normal way to duplicate a Subversion repository with full history is to use the svnadmin dump and load commands. Unfortunately most SVN hosting services, including Java.net, do not provide access to svnadmin commands or direct access to the file system.

Fortunately there is another way to clone a repository, complete with all its history, that requires only read access to the repository: svnsync.

The first step is to create a local SVN repository into which you will mirror the remote Java.net repository.

svnadmin create myproject

Before cloning the contents it is important that you add a pre-revprop-change hook to your new local repository. This is a script that must complete successfully (exit code 0) before any changes to revision properties are accepted. The easiest way to do this to add an empty script and make it executable.

echo '#!/bin/sh' > myproject/hooks/pre-revprop-change
chmod +x myproject/hooks/pre-revprop/change

Bearing in mind that we want to preserve tags and branches too, not just the trunk, we can then mirror the top-level of the remote repository.

svnsync init file:///pathto/myproject https://myproject.dev.java.net/svn/myproject
svnsync sync file:///pathto/myproject

This may take a while if the repository is large and/or your connection is slow.

Stripping Java.net Web Content from the Repository

Java.net uses the project SVN repository to manage the project website, with the files stored under trunk/www. When migrating from Java.net you probably don’t want to continue with this approach. If that’s the case then you’ll want to remove all traces of the www directory from the repository. The usual way of deleting a file – removing it from the working copy and then committing – will not purge its history so instead we use the svndumpfilter command.

First dump the mirrored repository:

svnadmin dump myproject > dump

We can then remove all traces of the www directory. Any commits that only touched files under www are now empty and are dropped completely. All remaining revisions are renumbered to avoid gaps in the sequence.

svndumpfilter --drop-empty-revs --renumber-revs exclude trunk/www < dump > filtered

The final step is to restore the filtered dump in place of our local repository.

rm -rf myproject
svnadmin create myproject
svnadmin load myproject < filtered

Migrating the Repository from SVN to Git

Now that the local SVN repository contains only what we wish to keep, we are ready to migrate it to Git. To achieve this I follow Paul Dowman’s instructions.

The first step is to import the SVN repository into a new Git repository.

git svn clone file:///pathto/myproject --no-metadata -A authors.txt -t tags -b branches -T trunk myproject-git

The authors.txt file maps SVN users to Git users. Your Java.net repository may have commits attributed to the users root and httpd. You should probably just map these to your own user name. There should be one entry for each committer:

root = Your Name <you@example.com>
httpd = Your Name <you@example.com>
you = Your Name <you@example.com>
other = Someone Else <other@example.com>

Refer to Paul’s full instructions if you have branches and tags to maintain.

Pushing to GitHub

Create a new repository on GitHub and then add this as a remote for your local Git repository.

git remote add origin git@github.com:username/myproject.git

And then push:

git push origin master --tags

Job done.

ReportNG 1.1 – i18n, Gradle fix, chronological ordering

Posted in Java, Software Development by Dan on May 15th, 2010

Over the last 6 weeks or so I seem to have taken an unintentional extended break from programming (and from posting here). It’s time to get back into the swing of things and top of my TODO list was putting the finishing touches to ReportNG 1.1 (ReportNG being an alternative HTML reporting plug-in for TestNG).

This new release fixes a problem people have been having using ReportNG with Gradle. It also adds internationalisation support. So now, as well as the default English text, there is also support for French and Portuguese. The Portuguese translation was contributed by Felipe Knorr Kuhn. The French text is just something I added as a proof of concept. It is likely to be offensively bad to a native French speaker and I welcome any corrections. I’d also appreciate any translations for other languages (just open an issue in the issue tracker and attach your translated version of this file).

The other major change is the addition of the “Chronology” page.  This is something that exists in the default TestNG reports that I originally decided to leave out of ReportNG. I didn’t really have a use for it but several people have asked for something similar so I have added it.

Zeitgeist 1.0 – An Intelligent RSS News Aggregator

Posted in Java, Software Development by Dan on November 26th, 2009

I recently signed-up for GitHub. Compared to Java.net or Sourceforge, it provides a much lower barrier of entry for code hosting.  There’s no need to wait an indeterminate period of time for somebody to approve your project, you just upload it. And because it’s a DVCS, it’s easy for other people to fork your projects and submit patches. Open Source project hosting has become so straightforward, thanks to sites like GitHub and the Bazaar-based Launchpad, that it encourages developers to open up code that they might otherwise have kept to themselves. After all, why bother with local repositories and back-ups when you can get somebody else to do it for you and get free web-hosting and issue-tracking too?

I have a number of trivial and incomplete projects hosted in local Subversion repositories. I am slowly adding to GitHub those that have any worthwhile substance to them. I’m making no promises about the quality of this code, and I don’t intend to spend much time supporting it, but I’m putting it out there in case somebody might have a use for it.

First up is Zeitgeist. This is a small Java library/application for identifying common topics among a set of news articles downloaded from RSS feeds. It’s sort of like what Google News does. There is a basic HTML publisher included that generates a web page for displaying the current top news stories, including relevant pictures.

You give the program a list of RSS feeds that cover a certain topic (maybe world news, or music news, or a particular sport) and it uses non-negative matrix factorisation to detect similarities in the article contents and to group the articles by topic. The original idea comes from Programming Collective Intelligence.

The default HTML output looks a bit like the image below, but you could customise it with CSS or by hacking the default templates to modify what information is included (for example, you could add an excerpt instead of just displaying headlines).

The algorithm is not infallible and how well it works depends a lot on the feeds that you select. It’s also non-deterministic, so if you run it multiple times with the same input you will get variations in the output.  Perhaps Zeitgeist is not that useful in it’s current form but it could be used for adding on-topic news headlines to a website or as the basis for something more advanced.

Programmers’ CVs – 20 years behind the times?

Posted in Software Development, The Internet by Dan on October 24th, 2009

Take a programmer’s CV/résumé from the late 1980s and one from today and, aside from the content, what has changed?

Not much. Both will typically be approximately two pages of static, word-processed, black text on white A4 paper (or US Letter in North America). Maybe the text doesn’t always arrive on actual paper these days thanks to the miracles of electronic document transfer, but the format of the typical CV has not really changed since the demise of the typewriter.

NOTE:  Just to be clear, I am considering CVs and résumés to be equivalent. Wikipedia makes a distinction but I’m assuming that’s mainly an American thing. In the UK there is typically no distinction and the Latin term “Curriculum Vitae”, abbreviated to “C.V.”, is almost universally preferred to the French “résumé” (I’ve no idea why there isn’t an English word for this type of document).

Have we really found the optimal way of communicating our skills and experiences, or has the humble CV been neglected by the Internet revolution? The IT recruitment industry seems wedded to the Microsoft .DOC format. This is partly because of the ubiquity of Microsoft Office in the corporate world and partly because agents prefer to receive CVs in a format that they can easily edit (which is why I insist on sending my CV as a PDF).

Dynamic CVs

Where’s the innovation? Shouldn’t a CV be something more dynamic? And why are we still e-mailing attachments every time we want somebody to see our CVs?  Attached files very quickly become outdated. I often have recruiters that I’ve spoken to in the past phoning me to ask for an updated CV. If my CV was a URL, people would always be able to see the latest version (assuming I let them have access). We do have LinkedIn profiles, which are fine in the context of your LinkedIn network but don’t really work as a general purpose CV.

Fortunately, there are some people trying to drag the programmer’s CV into the 21st century. Ben Northrop’s coderscv.com provides a free online home for your programming CV. The documents are nicely presented and the timeline view is a neat way to display your own personal history. VisualCV goes even further and embraces multimedia content, though the site is not IT-specific. Maybe it’s not a good idea to have a video as part of your CV but it’s nice to have the choice.

If you are planning to break a few conventions with your CV, either on the web or in a static file, it would be useful to measure the impact of any changes that you make, so you might be interested in TrackMyCV.com, which I saw announced today on Reddit. You use it to make documents trackable in pretty much the same way spammers embed 1-pixel images in e-mails in order to see which messages get read.

StackOverflow Careers

Another attempt to bring programmer CVs into the Web 2.0 age is the recently announced StackOverflow Careers.  The main Stackoverflow site has been a phenomenal success. Co-founder Joel Spolsky has had previous success with the Joel on Software jobs board and we are reminded that he wrote a book on recruiting programmers, so this kind of job-related monetisation was the obvious next step.

Voting peculiarities and reputation anomalies aside, StackOverflow is a meritocracy of sorts and it is this that Joel and business partner Jeff Atwood are attempting to exploit. A CV posted on StackOverflow Careers will be accompanied by a reputation score and a history of contributions to the programming community. The careers feature is currently in beta and is not particularly sophisticated at present but I expect it to expand over time.

I like the idea of expanding the scope of a CV to include other online evidence of programming competence. In this case it’s StackOverflow reputation but it could be Ohloh data or information from Google Code/SourceForge. However, at $99 to list your CV for a year, the current pricing is ridiculous. Most recruitment sites charge employers but let candidates use the service for free.  Joel and Jeff are taking a fee from both sides. The justification for charging candidates is that it will ensure that the only CVs listed on the service will be from people who are actively looking for work, increasing the value of the service to potential employers. It should also mean that your CV page is kept free from advertising.

I suspect that Joel and Jeff are aware that $99 is too much but it’s easier to start out too expensive and reduce the price than it is to do the opposite. The $99 figure also serves to make the introductory offer of $29 for 3 years look more reasonable in comparison. The problem with the introductory offer is that it only runs until 9th November. The site is still in beta and the functionality for employers to sign-up has not been launched yet. So, if you sign-up now to get the reduced rate, you’ll be paying $29 to list your CV on a site that is not used by any employers. It’s an unproven platform with a boot-strapping problem. Most programmers won’t want to pay money for a speculative premium service and employers are likely to be reluctant to sign-up to search a database with so few potential hires.

Attention to Detail

Posted in Software Development by Dan on September 23rd, 2009

A thought for the day, courtesy of Landon Dyer (no relation) a.k.a DadHacker.

“Good programs do not contain spelling errors or have grammatical mistakes. I think this is probably a result of fractal attention to detail; in great programs things are correct at all levels, down to the periods at the ends of sentences in comments.”

10 Tips for Publishing Open Source Java Libraries

Posted in Java, Software Development by Dan on July 29th, 2009

One of the strengths of the Java ecosystem is the huge number of open source libraries that are available.  There are often several alternatives when you need a library that provides some specific functionality.  Some library authors make it easy to evaluate and use their libraries while others don’t.  Open source developers may not care whether their libraries are widely used but I suspect that many are at least partially motivated by the desire to see their projects succeed.  With that in mind, here’s a checklist of things to consider to give your open source Java library the best chance of widespread adoption.

1. Make the download link prominent.

If other people can’t figure out how to download your project, it’s not going to be very successful. I’m bemused by the number of open source projects that hide their download links some place obscure. Put it in a prominent location on the front page. Use the word “download” and use large, bold text so that it can’t be missed.

2. Be explicit about the licence.

Potential users will want to know whether your licensing is compatible with their project. Don’t make users have to download and unzip your software in order to find out which licence you use. Display this information prominently on the project’s home page (don’t leave it hidden away in some dark corner of SourceForge’s project pages).

3. Prefer Apache, BSD or LGPL rather than GPL.

Obviously you are free to release your library under any terms that you choose. It’s your work and you get to decide who uses it and how. That said, while the GPL may be a fine choice for end user applications, it doesn’t make much sense for libraries. If you pick a copyleft licence, such as the GPL, your library will be doomed to irrelevance.  Even the Free Software Foundation acknowledges this (albeit grudgingly), hence the existence of the LGPL.

The viral nature of the GPL effectively prevents commercial exploitation of your work.  This may be exactly what you want, but it also prevents your library from being used by open source projects that use a more permissive licence.  This is because they would have to abandon the non-copyleft licence and switch to your chosen licence. That isn’t going to happen.

4. Be conservative about adding dependencies.

Every third-party library that your library depends on is a potential source of pain for your users. They may already depend on a different version of the same library, which can lead to JAR Hell (such problems can be mitigated by using a tool such as Jar Jar Links to isolate dependencies). Injudicious dependencies can also greatly increase the size of your project and every project that uses it.  Don’t introduce a dependency unless it adds real value to your library.

5. Document dependencies.

Ideally you should bundle all dependent JARs with your distribution. This makes it much easier for users to get started. Regardless, you should document exactly which versions of which libraries your library requires. NoClassDefFoundError is not the most friendly way to communicate this information.

6. Avoid depending on a logging framework.

Depending on a particular logging framework will cause a world of pain for half of your users. Some people like to use Sun’s JDK logging classes to avoid an external dependency; and some people like to use Log4J because Sun’s JDK logging isn’t very good. SimpleLog is another alternative.

If you pick the “wrong” logging framework you force your users to make a difficult choice.  Either they maintain two separate logging mechanisms in their application, or they replace their preferred framework with the one you insisted that they use, or (more likely) they replace your library with something else.

For most small to medium sized libraries logging is not a necessity. Problems can be reported to the application code via exceptions and can be logged there.  Incidental informational logging can usually be omitted (unless you’ve written something like Hibernate, which really does need trace logging so that you can figure out what is going on).

7. If you really need logging, use an indirect dependency.

OK, so not all libraries can realistically avoid logging.  The solution is to use a logging adapter such as SLF4J.  This allows you to write log messages and your users to have the final say over which logging back-end gets used.

8. Make the Javadocs available online.

Some libraries only include API docs in the download or, worse still, don’t generate it at all.  If you’re going to have API documentation (and it’s not exactly much effort with Javadoc), put it on the website. Potential users can get a feel for an API by browsing its classes and methods.

9. Provide a minimal example.

In an ideal world your library will be accompanied by a beautiful user manual complete with step-by-step examples for all scenarios. In the real world all we want is a code snippet that shows how to get started with the library. Your online Javadocs can be intimidating if we don’t know which classes to start with.

10. Make the JAR files available in a Maven repository.

This one that I haven’t really followed through on properly for all of my projects yet, though I intend to. That’s because I don’t use Maven, but some people like to. These people will be more likely to use your library if you make the JAR file(s) available in a public Maven repository (such as Java.net’s). You don’t have to use Maven yourself to do this as there is a set of Ant tasks that you can use to publish artifacts.

Netflix Prize Snatched Away at Last Moment?

Posted in Software Development, The Internet by Dan on July 26th, 2009

30 days ago, the BellKor’s Pragmatic Chaos team submitted the first qualifying solution for the $1 million Netflix prize.  The prize is awarded to the best performing solution 30 days after first submission that achieves the 10% improvement threshold.

BellKor achieved 10.05% on 26th June and have since moved on to 10.08%.  Several teams that were close to the qualifying mark responded by forming coallitions in a frantic race to find a hybrid solution that would surpass BellKor’s mark before the end of the 30 day period.

The Ensemble is one of these super teams.  They achieved the 10% mark two days ago and then today, on the very last day of the competition, they appear to have dramatically snatched the prize with a submission that is just 0.01% better than BellKor’s.

UPDATE: BellKor subsequently submitted an entry that matched the Ensemble’s 10.09% only for the Ensemble to trump that 20 minutes later with a score of 10.10%, 4 minutes before the submissions closed.

UPDATE 2: Simon Owens has posted an interview with one of the members of the winning Ensemble team.

UPDATE 3: The Ensemble themselves have posted an account of the nail-biting final minutes of the competition.

« Older Posts