PHP Versus the Fail-Fast Philosophy

Posted in PHP by Dan on April 10th, 2012

Internet rants about PHP are hardly uncommon but rarely are they as comprehensive as Eevee’s recent effort in which the world’s favourite bad programming language is dissected with a Yegge-esque disregard for brevity.

I view PHP as bash scripting for the web. Useful for getting small tasks accomplished quickly due to its ubiquity but dangerous and just about unmaintainable when taken to extremes.

Unfortunately over the last couple of years I’ve written and read far more PHP than is healthy (exposure to PHP, like radiation, should probably be monitored and subject to annual safe limits). If I had to pick just one complaint from the seemingly bottomless well of PHP’s shortcomings it would be encapsulated by this quote from Eevee’s article:

When faced with either doing something nonsensical or aborting with an error, [PHP] will do something nonsensical.

This fail-slow mentality permeates the language and its type system (or lack thereof). The runtime attempts to avoid disappointing the programmer with error messages and instead carries on regardless until things either grind to a halt, or worse, fail spectacularly in splendid silence. PHP is about as helpful as a SatNav that tells you you’ve made a wrong turn but not until 200 miles later.

Related article: Understanding PHP – A Journey into the darkness…

Understanding PHP – A Journey into the darkness…

Posted in PHP by Dan on July 31st, 2009

I knew PHP was a bit crufty before I got seriously involved with it. I’ve been trying to avoid writing a rant about how horrible it is as the web has enough of those already and, after all, it doesn’t really matter, does it? Still, to maintain my sanity I’ve been maintaining a list of everything that’s bad about PHP, mostly for my own amusement. However, the most recent entry on my list cannot be allowed to pass without comment.

"01a4" != "001a4"

We start with something simple and non-controversial. If you have two strings that contain a different number of characters, they can’t be considered equal. The leading zeros are important because these are strings not numbers.

"01e4" == "001e4"

However, PHP doesn’t like strings. It’s looking for any excuse it can find to treat your values as numbers. And here we have it. Change the hexadecimal characters in those strings slightly and suddenly PHP decides that these aren’t strings any more, they are numbers in scientific notation (PHP doesn’t care that you used quotes) and they are equivalent because leading zeros are ignored for numbers. To reinforce this point you will find that PHP also evaluates "01e4" == "10000" as true because these are numbers with equivalent values.  This is documented behaviour, it’s just not very sensible.

Enter ===

At this point the PHP apologists chime in with the suggestion to use the === operator. This is an equality operator that compares not only the values of the arguments but their types as well.  Both sides must have the same type as well as identical values. This doesn’t seem like it should make any difference as the literals on both side of the comparison already have identical types, regardless of whether that type is string or integer. Of course that’s not the case and when you use the extra equals sign the values remain as strings rather than being interpreted as integers. "01e4" === "001e4" evaluates to false (correct, but not entirely convincing).

"0x001a4" == 0x01a4

So it seems that the rule in PHP is that if the contents of a string can be parsed as a numeric literal then, for comparisons, they are, as we see with the above hexadecimals (note the difference in notation from the first example, specifically the use of the 0x prefix). Leading zeros are ignored when numbers are involved.

"0012" != 0012

Unfortunately that’s not the full story as the final example shows. Like many other languages, PHP interprets numbers beginning with a zero as octal values, but not when that number is within a string. This is completely inconsistent with the way it processes hexadecimal values and scientific notation within strings.

Using PHPUnit with Hudson

Posted in PHP by Dan on March 25th, 2009

The problem with undocumented standards is that they tend not to be very standardised. The XML format used by Ant’s JUnitReport task has been adopted, extended and bastardised by several different testing tools to the extent that there are at least half a dozen different dialects currently in use. This creates a problem for tools like Hudson that try to parse this inconsistent output. Currently Hudson works correctly with the XML from JUnit, TestNG (including ReportNG) and some other tools but it doesn’t recognise the output from Google Test or PHPUnit.

I was going to make the necessary changes to Hudson so that it accepts the PHPUnit and Google Test variants but I had some problems getting Hudson to build (yay Maven!). I may return to implementing this fix later but for now I’ve used a quick and dirty hack that massages PHPUnit’s output into a form more acceptable to Hudson. Since I’m invoking PHPUnit from a shell script, I can use sed to make the necessary modifications. In PHPUnit’s case, I just need to eliminate the nesting of <testsuite> tags, which can be done by deleting the third and penultimate lines of the XML file:

# Tweak the test result XML to make it acceptable to Hudson.
lines=`wc -l test-results/results.xml|awk '{print $1}'`
end=`expr $lines - 1`
sed -i "$end d;3d" test-results/results.xml

WordPress Headaches: Caching and FeedBurner

Posted in PHP, The Internet by Dan on August 1st, 2008

Still suffering a few teething problems following my attempts to Digg-proof this blog.  It seems that WP Super Cache, in its default configuration at least, is incompatible with FeedBurner and its WordPress plug-in.  So if you’ve been having trouble accessing my feed, this is why (for some reason it has been serving up the front page HTML instead of the feed RSS).

I tried a few things to fix the problem.  Each time it seemed to be working for a while but it soon went wrong again.  Rather than waste time figuring out what exactly is going wrong, I’ve switched back to WP Cache.

I have to agree with Jeff Atwood that caching really ought to be core functionality for a blog publishing platform like WordPress.  Then we wouldn’t have to mess around configuring different plug-ins and trying to get them to play nicely together.

In terms of functionality, WordPress still appears to be the best option for self-hosted blogging but it’s not without its annoyances.  If I were to switch from WordPress to something else, these are some of the features I would like to see:

  • Built-in page caching.
  • Support for multiple blogs with a single installation of the software (with WordPress, you have to use a different branch of the software to achieve this).
  • Support for databases other than MySQL (PostgreSQL as a minimum, but really any mainstream SQL database should be usable).
  • A better approach to themes (I shouldn’t have to write PHP to develop themes.  With appropriately structured pages, I could probably achieve everything that I want using just CSS).
  • Integrated support for popular advertising services such as Adsense (I shouldn’t have to cut-and-paste JavaScript into PHP files).
  • Ability to import posts and comments from WordPress.

Does such a platform exist, or will I have to write it myself?

Melting Virtual Servers: The Digg Effect

Posted in PHP, The Internet by Dan on July 28th, 2008

I was right about everybody having strong opinions about code commenting. The popularity of my previous post on the subject (perhaps due to its provocative title) brought this site to its knees.

The Calm Before the Storm

I was receiving higher-than-average traffic from the regular sources, such as DZone and JavaBlogs, but this traffic was still pretty trivial and nothing that the server couldn’t handle. The real problem occurred sometime on Sunday evening. While I was busy playing football (we lost 4-0), my post was rapidly gaining attention on Digg.

The entire Uncommons.org empire is hosted on a 256mb VPS (Virtual Private Server) from Slicehost. These 256 megabytes of RAM are shared between Apache, PHP, MySQL and Jetty.Though this is a modest amount of memory, it is sufficient for the traffic that I normally attract. I was not entirely unprepared for a traffic spike. I had installed the WP-Cache plugin for WordPress sometime ago to avoid unnecessary MySQL queries. I’d also tweaked the Apache configuration to make the most of the limited resources available to it (or so I thought – I guess I should take more notice of my own advice about optimisations).

Blissful Ignorance

For about two hours after the post hit the front page of Digg, I was completely oblivious to the pain being inflicted on my poor VPS. It wasn’t until I tried to visit my WordPress dashboard that I realised something wasn’t quite right. The server was not responding very promptly at all. SSH access was still available, though a little slow. First thing I checked was the load average. It was high but not massively so. Next I checked memory usage. Pretty predictably, all of the RAM and all of the swap space was being used.

I only use Jetty for running Hudson and, while it is pretty lightweight for a Java server, it still accounts for a lot of the memory usage on a 256mb slice. So I shut it down to release some resources.

I didn’t have to do much analysis to figure out where the traffic was coming from as my first guess was right; I quickly found my post in the Technology section of Digg. Initially I didn’t realise that it was also on the front page.

Some Figures

The post was on Digg’s front page for just over three and a quarter hours. I served up around 10,000 page impressions to around 6,000 unique visitors in that time. I’ve served over 9,000 pages so far today, at a more steady rate, and continue to receive hundreds of hits per hour. I’ve transferred over 3.5Gb of data in total. Throughout this time the server has stayed up but the site has been very slow to respond and I am sure that many visitors gave up before they got a response (some of the comments on Digg are redirecting readers to cached versions of the page because of this).  If the site had been up to it, it probably would have served thousands more pages. The article has received over 2000 “Diggs”, over half of them after it dropped off the front page.

It could have been worse.  At least I wasn’t linked to by Yahoo!.

Remedial Action

Even after the worst of the spike the response times were terrible. CPU load was negligible but there was a bottle-neck elsewhere, ultimately down to insufficient RAM.

Searching for potential solutions, one idea I liked a lot was using mod_rewrite to redirect all traffic from digg.com to a Coral cached version of the page. I did set this up, but I couldn’t get Coral to cache the page because its requests to my server were being timed out.

Eventually I decided to replace Apache… with Apache. The default Apache install for Ubuntu is the prefork MPM (multi-processing module) version. Apparently this process-based implementation is the more robust option but it’s considerably more RAM-hungry than the alternative thread-based worker MPM. In my memory-constrained enviroment, the worker MPM seemed worth a punt.

Fortunately, Slicehost makes back-ups ridiculously simple, so I took a one-click snapshot of the server and then installed the worker version of Apache:

sudo apt-get install apache2-mpm-worker

This was more work than I was expecting because it involved uninstalling PHP. To use the worker MPM I would have to run PHP via FastCGI. I found these instructions on how to set that up.

Once I’d finally got the configuration right (the web server was down for about 20 minutes), there was an immediate improvement. The server seems to be dealing with requests very well now, though I still have not restarted Jetty. Whether this configuration would have stood up to the peak force of Digg’s DDOS remains to be seen. I was also considering a switch to Lighttpd, but that would have been a bit more drastic since I’ve no experience with it.

Other Tweaks

The next item on my list is to replace WP-Cache with WP-Super-Cache so that WordPress can serve fully static pages without having to load the PHP engine each time. Other suggestions that I found include optimising the WordPress MySQL tables and installing a PHP compiler cache. Maybe I also need to set-up some kind of monitoring to alert me when there is a huge traffic spike?