In my previous post I talked about how to reduce the size of your Java binaries without sacrificing functionality. Using Proguard to strip out unused and redundant code, I was able to squeeze 1.4 megabytes of already-compressed JAR files (an applet plus its libraries) into 276kb. The motivation was to reduce download times and data transfer costs for network-launched software (applets and Web Start applications). An 80% size reduction is pretty impressive but can we do better? Yes we can.
GZip
A JAR file (or Java ARchive) is simply a zip file with a different extension. In other words the contents are already compressed. Given that we’ve already removed redundant information and zipped the files, you might think that we couldn’t compress the code much further.
Files in a zip archive are compressed individually. In practise, better compression is often achieved by compressing an archive as a whole so that similarities across separate files can be exploited. For this we can use gzip
. Compressing the 276kb JAR file results in a 251kb GZipped JAR file. This is a reduction of about 9%. Good but not spectacular.
GZip is more effective when its input is uncompressed. If we expand the 276kb JAR file, repack it as an uncompressed JAR (use jar -0
) and then gzip
that, the resultant jar.gz
file is a mere 193kb. Now that’s more like it, a 30% reduction on our already spartan 276kb binary.
At this point it is worth noting that you can’t just embed a jar.gz
file in an HTML page. It won’t work. Instead what you need to do is set-up content-negotiation on the web server so that when a browser requests the vanilla applet JAR file it receives the GZipped version. This is actually pretty straightforward and we’ll cover that shortly. But first, I don’t think that 193kb is anywhere near small enough. Can we do better? Yes we can.
Pack200
We’ve reached the limits of what we can achieve with general purpose compression techniques. We’ve also reached a lower limit for applets, since we are reliant on the browser for compression. For Web Start applications though it’s a different story.
The Sun JDK includes a little-known tool called pack200
. It is a compression utility designed specifically for compressing JAR files. Because Pack200 understands the class file format used by the archive contents, it is able to make optimisations that are unavailable to general purpose tools. Pack200 restructures the archive and the class files it contains and then GZips the result.
At this point I really didn’t think there was much scope for further reductions in size. I was wrong. That 276kb JAR that we started with, the one that started out as 1.4mb of compressed Java bytecode, the one that was squashed to just 193kb by GZip, was reduced to a tiny 81 kilobytes after Pack200 had finished with it.
Over the course of two blog posts, I’ve reduced my data transfer requirements by 94%. Despite the difference in size, the two programs are functionally equivalent. Of course, I shouldn’t pretend that compression is completely free. The client machine will have to unpack the archive. This will increase start-up processing, but since smaller files are downloaded quicker, it’s still likely to be faster overall.
Content-Negotiation
To use either Pack200 or GZip to compress network-launched Java applications requires content-negotiation on the web server. The client tells the web server what encodings it supports and the web server responds with the most appropriate option. In the case of applets, the client is the web browser. None of the browsers that I have tested support Pack200, but they will all accept GZip. For Web Start applications, the javaws
launcher does accept Pack200 so that will be used where available.
Sun’s Pack200 page describes a servlet that can perform the necessary content-negotiation. However, if you’re not already running a servlet container, it’s probably easier to use the features of your web server. Chris Nokleberg has written some straightforward instructions on how to achieve this with Apache.