Archive for the 'Blogging' Category

Squarefree succumbs to the Digg effect

Sunday, September 24th, 2006

Yesterday, at around 4pm, I noticed that the content on squarefree.com was missing, and the main page was an empty directory listing. I ssh'ed to my web server and noticed that the "squarefree.com" directory had been renamed to "squarefree.com_DISABLED_BY_DREAMHOST". Then I checked my email and saw a message from DreamHost support:

Hello,

I just had to disable your site squarefree.com as it's coming under some load and spawning countless php processes that are crashing the webserver. I wasn't able to figure out exactly what's going on, as leaving it up for more than a minute pretty much toasts the server. Please don't re-enable it until you've figured out what's going on, or disabled any possibly problematic php.

Thanks,

James

I jumped into #dreamhost on irc.freenode.net and started looking through my web server logs for suspicious requests. I was expecting to find that my blog had been DDoSed, perhaps by someone trying to leave comment spam. Instead, I found a large number of requests for non-existant files, falling into two categories:

  • Requests for favicon.ico, a file that does not exist on my site. Some of these requests are expected: most browsers with tabs request favicon.ico to display it in the tab bar. But there were also hundreds of IP addresses that requested nothing but favicon.ico for the entire day, and some requested it many times. About 100 of these IPs were Internet Explorer users with the Google Toolbar, so apparently I was getting DDoS'ed by a bug in the Google Toolbar. Another 100 were Firefox users; I haven't figured out why Firefox would request nothing but favicon.ico over and over.
  • Requests due to people using my Real-time HTML Editor to edit pages that used relative URLs for images, iframes, etc. One user made dozens of requests for a file named "border=0". Another user made a request for 14 gif files every time the editor refreshed. I also saw from referrers that the Real-time HTML Editor had been featured on Digg, greatly increasing its traffic.

But why would 404 requests create PHP processes? Due to a recent change in WordPress, Apache was directing each 404 request to WordPress. WordPress used to put detailed rules in .htaccess -- for example, it would ask Apache to direct requests for http://www.squarefree.com/2005/ to WordPress using RewriteRule ^([0-9]{4})/?$. But newer versions of WordPress instead ask Apache to send it all requests for nonexistent files. I imagine this puts less strain on Apache when a site uses lots of WordPress Pages, but it hurts when a site gets lots of 404 requests. Several months ago, I had instructed WordPress to serve my custom 404 page for these requests, but WordPress still had to do a lot of work to determine that the requests should be treated as 404s.

Once I realized what had happened, and determined that reconfiguring WordPress would be difficult, I did what I could to reduce the number of 404 requests WordPress would have to handle. I created a tiny favicon.ico file so those requests wouldn't be 404s, and I moved the Real-time HTML Editor onto its own subdomain so WordPress wouldn't handle the 404s it causes. My site was only down for 40 minutes, with the Real-time HTML Editor down a little longer while I waited for the new subdomain's DNS to propagate.

Some things DreamHost could have done better:

  • It would have been nice if James had disabled PHP for my domain instead of disabling my site entirely. Pornzilla did not need to be down due to PHP problems.
  • A per-user process limit might have allowed my site to send "503 Service Unavailable" in response to some requests instead of being down entirely. It would have also prevented my site from causing problems for other sites on the shared server.
  • Better performance diagnostics would have helped both James and me isolate the problem. For example, it would have been great to have a list of PHP processes showing the request URL that caused each PHP instance to be triggered, the lifetime of each process, and perhaps some performance information (CPU used, RAM used, number of database requests).

Some things DreamHost did right:

  • DreamHost allowed me to restore my site myself once I fixed the problems. All I had to do was rename "squarefree.com_DISABLED_BY_DREAMHOST" back to "squarefree.com".
  • Knowing about DreamHost's .snapshot feature kept me from panicking about data loss when my site appeared to have disappeared.
  • The employees in #dreamhost were helpful.

If anyone is wondering: yes, I still love DreamHost.

Now using wp-cache

Monday, August 15th, 2005

DreamHost sent me automated notices that I was using over 100 CPU minutes a day on slaw.dreamhost.com, a web server with over 300 accounts. In other words, I was using at least 1/60 the capacity of the quad-core server. I guessed that a lot of my CPU usage was from the 10,000 hits a day for The Burning Edge's feed, so I installed wp-cache for The Burning Edge. The plugin doesn't seem to break anything; I think it invalidates its entire cache when anything changes (except for templates).

Installing the plugin for The Burning Edge reduced my CPU usage to about 55 minutes a day, low enough to stop the automated notices but still not within the desired range of 30-40 minutes a day. I just installed it for this blog too.

Code in comments

Monday, May 23rd, 2005

I made some changes to my WordPress install to make it easier to post code in comments. You can now post code by enclosing it in <code> or <pre>. You still have to escape <, >, and & as &lt;, &gt;, and &amp;, but you no longer have to worry about wrapping, indentation, and smart quotes.

Making these changes was harder than I expected.

Read the rest of this entry »

Valid XHTML user script

Monday, May 16th, 2005

The Valid XHTML user script is an adaptation of the blogidate XML well-formedness bookmarklet. It shows a line of text under each textarea indicating whether the text is well-formed XHTML. When the text is not well-formed XHTML, it displays Gecko's error message and gives you a link that selects the location of the error in the textarea. When the text is well-formed XHTML, it displays links that let you check whether the XHTML is valid in addition to being well-formed.

By default, it only runs on admin posting pages for Movable Type and WordPress and on archive pages for Simon Willison's blog. You can use Greasemonkey's interface to make it run on the sites on which you edit XHTML.

Screenshots

Demo for Firefox and other Gecko browsers

New name

Monday, May 9th, 2005

This blog is now called "Indistinguishable from Jesse" instead of "Jesse Ruderman". The pages for some categories have special names, such as "Indistinguishable from Gravity" for the Physics category.

Ask Jesse answer: WordPress

Monday, May 2nd, 2005

Joey also asked:

As a fellow WordPress user (only other system I’ve ever used is Blogger), what plugins do you have installed, and do you use the bookmarklet? I love the spell checker and just got a crossposting plugin to work, crossposting to Xanga since all my friends don’t know too much, if anything, about the Internet.

I don't use the WordPress bookmarklet because I post URLs on my del.icio.us account instead of on my blog most of the time. I do use a del.icio.us bookmarklet, of course. I used the favicon picker extension to give the del.icio.us bookmarklet the del.icio.us icon and then gave it an empty name, so it takes up little space on my toolbar.

The only WordPress plugin I use is Text Control, which I use to disable WordPress's buggy auto-formatting and auto-texturizing. See my post about switching from Movable Type to WordPress for details. I haven't tried any spell checking plugins; which one do you recommend?

Ask Jesse

Sunday, May 1st, 2005

Ask me questions as comments on this entry, and I'll try to respond.

I stole this idea from Asa.

Switching from Movable Type to WordPress

Sunday, March 13th, 2005

Downloading WordPress was much easier than downloading Movable Type. Installing WordPress was very easy. Importing from Movable Type was also easy, unless you include the hours I spent cleaning up WordPress's mistakes in the import process (see "Problems I ran into after importing from Movable Type" below).

I found that WordPress could be installed in a directory with generated Movable Type content. Apache chooses the index.html generated by Movable Type over the index.php that is part of WordPress, so the Movable Type blog was still live until I deleted index.html. This approach worked better than installing WordPress in a different directory, which was the first approach I tried.

Advantages of WordPress over Movable Type

  • WordPress is free and will remain free.
  • When I am logged in, I get an "Edit" link for every post and comment. With Movable Type, I had added a <link rel="edit"> to every individual archive page and written a bookmarklet to follow these hidden links. WordPress's "edit" links are built-in and more convenient.
  • Templates are arranged in a way that avoids duplication. I can change the sidebar in all pages from sidebar.php, and a single edit modifies how posts appear in both category archives and monthly archives. (I had rearranged my Movable Type templates in a similar way.)
  • No rebuilds. Rebuilds in Movable Type were slow, and I had to to think about what needed rebuilding after each change.
  • The default template, Kubrick, is prettier than Movable Type's default template.
  • WordPress automatically adds rel="nofollow" to every link added by commenters or trackbacks. This might be enough to keep spammers from actively seeking out WordPress 1.5 blogs to spam.
  • Trackbacks and comments are combined.
  • Permalinks for comments.
  • For each post, there is an RSS feed for comments.
  • RSS feeds for many pages. For URLs that start with "/index.php?", such as searches, change "/index.php?" to "/feed?" to get an RSS feed. For other URLs, such as category archives, simply add "/feed" to the end. If you prefer Atom to RSS, use "/feed/atom" instead of "/feed". I haven't added links to these RSS feeds yet. (With Movable Type, I had set up category feeds by adding a new template.)
  • Cruft-free post URLs. (When I enable this feature, I had to paste rules generated by WordPress into an .htaccess file. For some reason, WordPress does not have permission to edit my .htaccess files on its own.)

Disadvantages of WordPress

  • WordPress generates every page for every request, so loading my blog is now slightly slower.
  • WordPress doesn't support If-Modified-Since, so every request results in an entire page being sent, even if the visitor has a cached copy and the page hasn't changed. When I used Movable Type (which does support If-Modified-Since), RSS feeds accounted for over 10 GB/mo. I'll have to keep an eye on my bandwidth usage for a few days. I don't know what percent of RSS readers requesting my feeds support If-Modified-Since, so I don't have a good prediction of how much bandwidth my RSS feeds will use now. I'll probably be ok, since I was only using 25 GB/mo of DreamHost's generous allotment of 120 GB/mo.
  • Kubrick (the default WordPress theme) uses images in bad ways. For example, there is a single background image for the content area and sidebar combined. Also, the backgrounds don't seem to have corresponding background colors, so the background colors are very wrong until the background images load.
  • Kubrick is too narrow for The Burning Edge because you can't see enough bugs at a time. This will take some work to fix.
  • Kubrick uses some some CSS rules that use "ol li" where they should use "ol > li". "ol li" matches a list item in an unordered list that is nested in an ordered list, such as in this post. I think I can fix this without breaking Internet Explorer.
  • Kubrick's CSS doesn't take into account the different element classes on the main page and individual archive pages, so paragraphs and lists in posts look different depending on what page you're on.

Problems I ran into after importing from Movable Type

  • Character encoding. My Movable Type content was exported as ISO-8859-1. WordPress imported this file as if it were UTF-8. As a result, many western european characters in my posts got mangled and I had to fix them manually. I later read that some switchers avoided this problem by converting the file from ISO-8859-1 to UTF-8 using a text editor before importing. Characters not in ISO-8859-1 had already been converted to numeric HTML entities(?), so they were fine.
  • Post formatting. Movable Type has two methods of formatting posts: "None", which leaves your post alone, and "Convert line breaks", which converts double line breaks to paragraphs and leaves single line breaks alone. WordPress's import script doesn't pay pay attention to which Movable Type formatting method I used, and it doesn't have modes correpsonding to either of Movable Type's modes. WordPress has more complex formatting: "Correct invalidly nested XHTML automatically", auto-formatting, and auto-texturizing(?). All three of these features are so buggy that I had to disable them.
    • "Correct invalidly nested XHTML automatically" doesn't let me nest blockquotes, and actually changed the original HTML for one post that uses nested blockquotes. It was easy to turn off.
    • Auto-formatting added <br /> tags for every line break and put <p> tags in many places they didn't belong, such as inside <textarea> tags. It also inexplicably added backslashes before double-quotes within <pre> blocks in this post and this post. I had to install the Text Control plugin to disable it.
    • Auto-texturizing, which turns quotes into “smart” quotes. It got the direction of quotes wrong in this post. It also smartified quotes in places they shouldn't have been smartified, such as in examples of HTML code and examples of Google queries. The Text Control plugin let me disable this feature too.

    Before I tried turning off auto-formatting altogether, I tried using Markdown instead of WordPress's default auto-formatter. Markdown fixed some of the problems, but not all of them, and introduced some new (minor) problems. I ended up disabling it too.

    Once I had disabled auto-formatting, I was left with the problem that many of my posts didn't have <p> tags, because I had relied on the "Convert line breaks" option in MT. I wrote a bookmarklet to add those tags to posts that needed them. (Note: this bookmarklet doesn't handle <!--more--> well.)

  • Headers in posts. In my modified Movable Type template, posts were <H2>, so sections of posts were <H3>. In Kubrick, posts are <H3>, so sections of posts should be <H4>. I made this change manually in old posts. This change should make Planet happier too.
  • Home page title. I wanted the front page of The Burning Edge to have the title "The Burning Edge - Firefox nightly build changelog" rather than just "The Burning Edge" to help search engines and search engine visitors. My solution was to add a PHP conditional statement using is_home() to the part of header.php that generates the <title> tag.
  • Redirecting individual archives. I wanted to redirect old Movable Type URLs to new WordPress URLs because many people have linked to the old URLs. I used a modified version of this Movable Type template to generate .htaccess redirect rules. I had to add encode_php="qq" and str_replace("_", "-", ...) in order to make my URLs correct.
  • Redirecting other URLs. I redirected what I thought were the most important category, RSS, and category RSS URLs manually by adding .htaccess files. I'll go through my error logs in a few days to see what other URLs need redirecting.