Archive for the 'Google' Category

Accidental Googlebomb

Thursday, January 1st, 2009

This Google search now maligns C++. Oops!

Meeting spot

Monday, October 8th, 2007

Google suggests holding tomorrow's leak meeting on a cruise ship.

Somehow I don't think that would work very well. Leaks and ships don't get along perfectly.

Googlebombing “leave”?

Sunday, May 13th, 2007

A Google search for "leave" still reflects the time when most porn sites had "age verification" on their front pages. "Age verification" often took the form of the text "You must be 18 to enter" followed by "Enter" and "Leave" links. The "Leave" link would often lead to a site appropriate for young kids or to a sex-education site.

Even today, when few new sites follow this practice, "Leave No Trace" and "Leave It To Beaver" are beaten by Yahoo, Google, Scarleteen, and Disney.

I wondered why Google's algorithm continued to make this possible despite tweaks to prevent Googlebombs such as "miserable failure". I came across this comment by Google engineer Matt Cutts:

[The algorithm change] really does have a very limited scope and doesn’t affect a large fraction of queries. The intent of the algorithm is to minimize the impact of “true” Googlebombs, which occur when someone is causing someone else’s page to rank for stuff that they wouldn’t want to rank for themselves. The algorithm could detect phrases such as [leave] as a Googlebomb in future iterations, but it doesn’t right now and I don’t think that Disney would care much either way.

Googlebombs were slightly embarrassing, but I imagine that abandoning link text would have hurt search quality a lot. I'm impressed that Google was able to come up with an algorithmic way to distinguish Googlebombs from other link text.

Google 411

Friday, April 20th, 2007

Google launched a free 411 service just in time for my move from San Diego to Mountain View. I found it useful, but it could have been even more useful if it:

  • Gave better location information for things in malls. For example, the Goodwill donation spot at 570 Showers Drive would be better described as "in the parking lot near Mervyns and near Showers Drive".
  • Knew the difference between Goodwill stores and Goodwill donation spots.
  • Knew how to answer questions like "Where can I find a Denny's along I-5 North within the next hour?", rather than simple city and radius searches.

Store hours would be nice too, but the service would also have to know when to say something like "Beach City Grill closes whenever the owner feels like closing, so you are advised to call before driving there."

Bundled software in security updates

Saturday, October 28th, 2006

Today's Java security update includes a checked-by-default "Install Google Toolbar for Internet Explorer" option. Shame on you, Sun and Google. Automatic security updates are no place to push unrelated, bundled software. Making security updates annoying hurts security almost as much as making security updates complicated: users will be less inclined to update next time.

This is similar to how Flash updates attempt to install the Yahoo Toolbar. It's certainly not as bad as the frequently updated AOL Instant Messenger, which turns on the "Today window" popup on every AIM account and adds a "Netscape ISP" icon to the desktop with every security update. But I thought Google was trying to set a good example.

Squarefree succumbs to the Digg effect

Sunday, September 24th, 2006

Yesterday, at around 4pm, I noticed that the content on was missing, and the main page was an empty directory listing. I ssh'ed to my web server and noticed that the "" directory had been renamed to "squarefree.com_DISABLED_BY_DREAMHOST". Then I checked my email and saw a message from DreamHost support:


I just had to disable your site as it's coming under some load and spawning countless php processes that are crashing the webserver. I wasn't able to figure out exactly what's going on, as leaving it up for more than a minute pretty much toasts the server. Please don't re-enable it until you've figured out what's going on, or disabled any possibly problematic php.



I jumped into #dreamhost on and started looking through my web server logs for suspicious requests. I was expecting to find that my blog had been DDoSed, perhaps by someone trying to leave comment spam. Instead, I found a large number of requests for non-existant files, falling into two categories:

  • Requests for favicon.ico, a file that does not exist on my site. Some of these requests are expected: most browsers with tabs request favicon.ico to display it in the tab bar. But there were also hundreds of IP addresses that requested nothing but favicon.ico for the entire day, and some requested it many times. About 100 of these IPs were Internet Explorer users with the Google Toolbar, so apparently I was getting DDoS'ed by a bug in the Google Toolbar. Another 100 were Firefox users; I haven't figured out why Firefox would request nothing but favicon.ico over and over.
  • Requests due to people using my Real-time HTML Editor to edit pages that used relative URLs for images, iframes, etc. One user made dozens of requests for a file named "border=0". Another user made a request for 14 gif files every time the editor refreshed. I also saw from referrers that the Real-time HTML Editor had been featured on Digg, greatly increasing its traffic.

But why would 404 requests create PHP processes? Due to a recent change in WordPress, Apache was directing each 404 request to WordPress. WordPress used to put detailed rules in .htaccess -- for example, it would ask Apache to direct requests for to WordPress using RewriteRule ^([0-9]{4})/?$. But newer versions of WordPress instead ask Apache to send it all requests for nonexistent files. I imagine this puts less strain on Apache when a site uses lots of WordPress Pages, but it hurts when a site gets lots of 404 requests. Several months ago, I had instructed WordPress to serve my custom 404 page for these requests, but WordPress still had to do a lot of work to determine that the requests should be treated as 404s.

Once I realized what had happened, and determined that reconfiguring WordPress would be difficult, I did what I could to reduce the number of 404 requests WordPress would have to handle. I created a tiny favicon.ico file so those requests wouldn't be 404s, and I moved the Real-time HTML Editor onto its own subdomain so WordPress wouldn't handle the 404s it causes. My site was only down for 40 minutes, with the Real-time HTML Editor down a little longer while I waited for the new subdomain's DNS to propagate.

Some things DreamHost could have done better:

  • It would have been nice if James had disabled PHP for my domain instead of disabling my site entirely. Pornzilla did not need to be down due to PHP problems.
  • A per-user process limit might have allowed my site to send "503 Service Unavailable" in response to some requests instead of being down entirely. It would have also prevented my site from causing problems for other sites on the shared server.
  • Better performance diagnostics would have helped both James and me isolate the problem. For example, it would have been great to have a list of PHP processes showing the request URL that caused each PHP instance to be triggered, the lifetime of each process, and perhaps some performance information (CPU used, RAM used, number of database requests).

Some things DreamHost did right:

  • DreamHost allowed me to restore my site myself once I fixed the problems. All I had to do was rename "squarefree.com_DISABLED_BY_DREAMHOST" back to "".
  • Knowing about DreamHost's .snapshot feature kept me from panicking about data loss when my site appeared to have disappeared.
  • The employees in #dreamhost were helpful.

If anyone is wondering: yes, I still love DreamHost.

New version of Search Keys

Tuesday, November 29th, 2005

I released a new version of Search Keys to make it work with current versions of the Google and web sites. (Search Keys is a Firefox extension that lets you press a number to go to a result in a search engine, so you don't have to remove a hand from the keyboard after typing a search query.)

Security holes in Google Desktop Search fixed

Sunday, July 17th, 2005

Google recently fixed several holes in Google Desktop Search that I found. This is the email I sent to to report the holes:

This combination of security holes in mulitple products allows an attacker to read text files indexed and cached by Google Desktop Search. Its success rate is proportional to the amount of time the attacker can keep the victim on the attacker's site and the victim's CPU speed. I think all parts of this attack would work against both Firefox and Internet Explorer, but I've only tested part 1 and only in Firefox.

Recover the URL for the home page of Google Desktop Search

The URL for the front page of Google Desktop Search is for some 10-digit string nnnnnnnnnn. If the string is incorrect, GDS returns a page that says "Invalid Request". This seems to be a second line of defense against XSS and CSRF attacks.

Most browsers have information leaks that allow web scripts to determine whether a link is visited. The attacker assumes that the user has visited the GDS start page with the correct value for nnnnnnnnnn recently enough that the URL is in the browser's global history. Based on my experiments and calculations, it would take several days of CPU time for a script in an untrusted web page in Firefox to find out which of the 10^10 links of the form is visited. An attacker might try to keep a victim on a page for several days, or might try to keep a large number of users on his site for a shorter peroid of time. I don't know what algorithm generates the value nnnnnnnnnn, so I don't know if it has weaknesses that might allow the attacker's script to test fewer than 10^10 URLs.

Solutions: GDS could use a longer salt, to make iterating through every possible salt value harder. GDS could restrict salts to single use, but I think this would break too many things. Firefox (and other browsers) could plug the information leaks in global history.


Perform a Princeton DNS attack

First, make resolve to an IP under the control of the attacker, with a short TTL. Make the victim load, which contains a script. Then make resolve to The script then creates an iframe that loads and uses cross-frame scripting to control the page served by GDS.

You can check that GDS does not prevent this part of the attack by loading GDS and then replacing in the URL with (which resolves to

Solutions: GDS could reject requests where the hostname is not "" or "localhost" (IMO, the HTTP protocol requires it to do so). Firefox, Windows XP, the Windows XP firewall, or my ISP could prevent "external" DNS names from resolving to "internal" IP


Combining the holes

Once the attacker has script access to, has resolving to, and knows the hash for the home page, he can search for text files and view cached text files. (The links to cached text files are absolute and have as the hostname, but they continue to work when is replaced by, which resolves to

I sent this email on Feb 13, 2005. The first part was fixed in version 20050227 by making the salt longer. The second part was fixed in version 20050325 by making GDS reject requests with hostnames other than "" and "localhost". Google started pushing the updated version to existing users on June 2, 2005, so most users should be upgraded by now. You can see what version of GDS you have by clicking "About".

This is not the same as the hole found by Rice students (Slashdot article), which had been fixed previously.