Squarefree succumbs to the Digg effect
Sunday, September 24th, 2006Yesterday, at around 4pm, I noticed that the content on squarefree.com was missing, and the main page was an empty directory listing. I ssh'ed to my web server and noticed that the "squarefree.com" directory had been renamed to "squarefree.com_DISABLED_BY_DREAMHOST". Then I checked my email and saw a message from DreamHost support:
Hello,
I just had to disable your site squarefree.com as it's coming under some load and spawning countless php processes that are crashing the webserver. I wasn't able to figure out exactly what's going on, as leaving it up for more than a minute pretty much toasts the server. Please don't re-enable it until you've figured out what's going on, or disabled any possibly problematic php.
Thanks,
James
I jumped into #dreamhost on irc.freenode.net and started looking through my web server logs for suspicious requests. I was expecting to find that my blog had been DDoSed, perhaps by someone trying to leave comment spam. Instead, I found a large number of requests for non-existant files, falling into two categories:
- Requests for favicon.ico, a file that does not exist on my site. Some of these requests are expected: most browsers with tabs request favicon.ico to display it in the tab bar. But there were also hundreds of IP addresses that requested nothing but favicon.ico for the entire day, and some requested it many times. About 100 of these IPs were Internet Explorer users with the Google Toolbar, so apparently I was getting DDoS'ed by a bug in the Google Toolbar. Another 100 were Firefox users; I haven't figured out why Firefox would request nothing but favicon.ico over and over.
- Requests due to people using my Real-time HTML Editor to edit pages that used relative URLs for images, iframes, etc. One user made dozens of requests for a file named "border=0". Another user made a request for 14 gif files every time the editor refreshed. I also saw from referrers that the Real-time HTML Editor had been featured on Digg, greatly increasing its traffic.
But why would 404 requests create PHP processes? Due to a recent change in WordPress, Apache was directing each 404 request to WordPress. WordPress used to put detailed rules in .htaccess -- for example, it would ask Apache to direct requests for http://www.squarefree.com/2005/ to WordPress using RewriteRule ^([0-9]{4})/?$. But newer versions of WordPress instead ask Apache to send it all requests for nonexistent files. I imagine this puts less strain on Apache when a site uses lots of WordPress Pages, but it hurts when a site gets lots of 404 requests. Several months ago, I had instructed WordPress to serve my custom 404 page for these requests, but WordPress still had to do a lot of work to determine that the requests should be treated as 404s.
Once I realized what had happened, and determined that reconfiguring WordPress would be difficult, I did what I could to reduce the number of 404 requests WordPress would have to handle. I created a tiny favicon.ico file so those requests wouldn't be 404s, and I moved the Real-time HTML Editor onto its own subdomain so WordPress wouldn't handle the 404s it causes. My site was only down for 40 minutes, with the Real-time HTML Editor down a little longer while I waited for the new subdomain's DNS to propagate.
Some things DreamHost could have done better:
- It would have been nice if James had disabled PHP for my domain instead of disabling my site entirely. Pornzilla did not need to be down due to PHP problems.
- A per-user process limit might have allowed my site to send "503 Service Unavailable" in response to some requests instead of being down entirely. It would have also prevented my site from causing problems for other sites on the shared server.
- Better performance diagnostics would have helped both James and me isolate the problem. For example, it would have been great to have a list of PHP processes showing the request URL that caused each PHP instance to be triggered, the lifetime of each process, and perhaps some performance information (CPU used, RAM used, number of database requests).
Some things DreamHost did right:
- DreamHost allowed me to restore my site myself once I fixed the problems. All I had to do was rename "squarefree.com_DISABLED_BY_DREAMHOST" back to "squarefree.com".
- Knowing about DreamHost's .snapshot feature kept me from panicking about data loss when my site appeared to have disappeared.
- The employees in #dreamhost were helpful.
If anyone is wondering: yes, I still love DreamHost.