Improving incentives for web advertisers

August 10th, 2011

When users install ad filters out of a desire to avoid unpleasant ads, they usually end up blocking all ads. This does little to incentivize individual sites to clean up their ads, because from the perspective of any web site owner, users decide whether to block ads mostly based on their experiences on other sites.

As more users turn to ad filters, debates about ad filtering are becoming increasingly polarized. A pro-ad faction screams “blocking ads is like stealing from web sites”. An anti-ad faction screams “showing me ads is like kicking me in the groin in the hope that a penny will fly out of my pocket”.

A better way?

What if a future version of Adblock Plus only tried to block bad ads by default? The immediate result would be negligible, because most ad networks today are somewhere between bad and terrible. But some users would feel more comfortable enabling the blocks, and web site owners would have a harder time blaming visitors for missed revenue.

Current Adblock Plus first run page

Ad filter set:
e.g. animations, sounds, interstitials
[?]e.g. plugins, scripts that block parsing
Proposed options and defaults

“Block distracting ads” would block ads that animate, ads with bright pink, gigantic ads, <audio> ads, ads that use absolute positioning to cover other content, and plugins.

“Block slow ads” would block ad scripts that do not use async or defer, any ad that uses more than 5 sequential or 10 total requests, any ad content that hasn't finished loading after 500ms, and plugins.

Note that these are all things that can be detected by the client, which already has a filter set that distinguishes ads from non-ads. Upon blocking an ad through one of these heuristics, the entire ad network should be blocked for a period of time, so that Firefox does not waste time downloading things it will not display. The filter set could also specify ad networks known to specialize in distracting ads, or ad networks that are slow in ways the heuristics miss.

Blocking manipulation

The heuristics above would block ads that interfere directly with your web experience, but what about ads that harm you in slightly subtler ways? Maybe activists would be inspired to curate subsets of existing filters, focusing on their causes:

Ad filter set:
[?]e.g. non-evidence-based medicine, “Free*”
[?]e.g. appeals to feelings of inadequacy
[?]e.g. sexual puns, gratuitous cleavage
Ad filter set:
e.g. lingerie, “adult dating”
e.g. action films, appeals to fear
e.g. junk food, tobacco
[?]e.g. toys, nag coaching

These would block almost every ad network today, assuming the curators err on the side of over-blocking when a network carries multiple types of ads. I can only name one ad network that demonstrates the slightest bit of competence at keeping out scams and one ad network that actively gathers feedback from viewers about individual ads.

With improved incentives, more ad networks would try to do the right thing.

I look forward to a future where advertising is a truly low-transaction-cost way to compensate free content providers.

I look forward to innovators and creators once again having a way to connect with people who might genuinely stand to benefit from their work, without having their voices drowned out by screaming scammers.

More skimmable diffs on hgweb

June 19th, 2011

I've found it hard to skim diffs on hgweb (example). The lack of visual hierarchy meant I often didn't notice the jumps between files, even when I was looking for them.

I wrote a user stylesheet to fix this. Between files, it adds a gray bar, vertical space, and outdented filenames. Between sections of a file, it adds a small amount of vertical space. (screenshot before, screenshot after)

To install hgweb-visual-hierarchy.css, add it to chrome/userContent.css in your Firefox profile directory. You may need to create the subdirectory and file.

Tracking after-fix tasks

June 10th, 2011

I often come across a bug report and decide that I want to do something once it's fixed:

  • Once bug X is fixed, tweet joyfully.
  • Once bug X is fixed, update some documentation.
  • Once bug X is fixed, add a feature to a fuzzer.
  • Once bug X is fixed, retest a fuzz testcase that I assumed triggered bug X but might in fact trigger a different bug.
  • Once bug X is fixed, add a regression test for bug W.
  • Once bug X is fixed, see if it also fixed bug Y.

I could CC myself to the bug, but then I'll get lots of email and might forget my reason for being CCed. I could create a dependency, but that sends everyone else confusing bugspam and gives me notifications that are easy to miss.

The after-fix tool

I created a tool called after-fix to track these bug-dependent tasks for me. I have a large after-fix config file with entries like:

# Make use of new GC APIs in DOM fuzzer bug 661469

I'll see my note the next time I run after-fix after bug 661469 is fixed.

Pruning workarounds

I've also been using after-fix to ensure workarounds don't outlive their purpose:

Faster crash analysis with stack-blame

June 1st, 2011

When a new crash appears, we often want to know whether any code on the stack changed recently. Historically, this has required opening each source link in a new tab and waiting for hgweb to generate megabytes of HTML. Even if you only look at the top 10 frames of each stack, this gets boring quickly.

I created stack-blame to make this easier. It shows 5 lines of context around each line of the stack trace, and highlights fresh lines in green. In the stack-blame output for this crash report, it's easy to see that of the cairo functions on the stack, only _cairo_quartz_surface_finish was touched recently.

Let’s give musicians an alternative to copyright

December 10th, 2010

Submitted to the US Department of Commerce in response to their call for comments on Copyright Policy, Creativity, and Innovation in the Information Economy.

Let's allow musicians to choose to give up their monopoly on distribution rights. In return, let's give them money in proportion to the popularity of their works, starting with money the government and ISPs would have spent on copyright enforcement.

Copyright is a poor mechanism for encouraging creation

The premise of copyright is that the monopoly rent of a work is a good proxy for the benefit to society of the work's existence. This is no longer the case, at least for recorded music.

Copyright limits the societal benefit of the work's existence, because:

  • People who can't afford recorded music don't get to enjoy it.
  • People often can't listen to songs before purchasing them, so they purchase too few songs or the wrong songs.
  • People are denied the joy of sharing music with their friends.

The portion of the societal benefit that is apparent in the monopoly rent is very low, because:

  • Transaction costs form a large portion of the purchase price.
  • Promotion costs are high in order to overcome consumer reluctance to spend money on an unknown.

Because of copyright, most of the potential societal benefit of a new song goes to deadweight loss and transaction costs. Only a tiny portion makes it to the musician.

Copyright harms society

Since the rise of the Internet, copyright has begun to have negative externalities that go beyond musicians and listeners:

  • Government resources are spent on copyright enforcement.
  • A hidden tax is levied on internet connections as ISPs are forced to filter, forward notices of infringement, and respond to subpoenas.
  • User-generated content is at risk from fraudulent takedown notices.
  • Popular infringement, combined with sporadic-but-harsh enforcement of copyright laws, diminishes respect for all laws.

Attempts to enforce copyright through DRM software create additional problems:

  • DRM conflicts with fair use.
  • DRM disadvantages open-source software.
  • DRM anti-circumvention laws conflict with free speech among software developers.
  • DRM legitimizes infringement in the minds of users who find they cannot listen to purchased music on a new device.

Copyright is becoming increasingly inefficient and harmful. Let's try an alternative, and let musicians experiment with a wider range of promotion models.

Fuzzing in the pool

November 23rd, 2010

In mid-2009, John O'Duinn offered to let my DOM fuzzer run on the same pool of machines as Firefox regression tests. I'd have an average of 20 computers running my fuzzer across a range of operating systems, and I wouldn't have to maintain the computers. All I had to do was tweak my script to play nicely with the scheduler, and not destroy the machines.

Playing nicely with the scheduler

Counter-intuitively, to maximize the amount of fuzzing, I had to minimize the duration of each fuzz job. The scheduler tries to avoid delays in the regression test jobs so developers don't go insane watching the tree. A low-priority job will be allowed to start much more often if it only takes 30 minutes.

Being limited to 30 minutes means the fuzz jobs don't have time to compile Firefox. Instead, fuzz jobs have to download Tinderbox builds like the regression test jobs do. I fixed several bugs in mozilla-central to make Tinderbox builds work for fuzzing.

I also modified the testcase reducer to split its work into 30-minute jobs. If the fuzzer finds a bug and the reducer takes longer than 30 minutes, it uploads the partially-reduced testcase, along with the reduction algorithm's state, for a subsequent job to continue reducing. To avoid race conditions between uploading and downloading, I use "ssh mv" synchronization.

Not destroying the test slaves

I wasn't trying to fill up the disks on the test slaves, really!

Early versions of my script filled up /tmp. I had incorrectly assumed that /tmp would be cleared on each reboot. Luckily, Nagios caught this before it caused serious damage.

Due to a security bug in some debug builds of Firefox, the fuzzer created randomly-named files in the home directory. This security bug has been fixed, but I'm afraid RelEng will be finding files named "undefined" and "[Object HTMLBodyElement]" for a while.

By restarting Firefox frequently, fuzzing accelerated the creation of gigantic console.log files on the slaves. We're trying to figure out whether to make debug-Firefox not create these files or make BuildBot delete them.

Results so far

Running in the test pool gets me a variety of operating systems. The fuzzer currently runs on Mac32 (10.5), Mac64 (10.6), Linux32, Linux64, and Win32. This allowed me to find a 64-bit-only bug and a Linux-only bug in October. Previously, I had mostly been testing on Mac.

The extra computational power also makes a difference. I can find regressions more quickly (which developers appreciate) and find harder-to-trigger bugs (which developers don't appreciate quite as much). I also get faster results when I change the fuzzer, such as the two convoluted testcases I got shortly after I added document.write fuzzing.

Unexpectedly, getting quick results from fuzzer changes makes me more inclined to tweak and improve to the fuzzer. I know that the change will still be fresh in my mind when I learn about its effects. This may turn out to be the most important win.

With cross-platform testing and the boost to agility, I suddenly feel a lot closer to being able to share and release the fuzzer.

How my DOM fuzzer ignores known bugs

November 21st, 2010

When my DOM fuzzer finds a new bug, I want it to make a reduced testcase and notify me so I can file a bug report. To keep it from wasting time finding duplicates of known bugs, I maintain several ignore lists:

Some bugs are harder to distinguish based on output. In those cases, I use suppressions based on the fuzzer-generated input to Firefox:

Fixing any bug on those lists improves the fuzzer's ability to find additional bugs. But I'd like to point out a few that I'd especially like fixed:

In rare cases, I'll temporarily tell the fuzzer to skip a feature entirely:

Several bugs interfere with my ability to distinguish bugs. Luckily, they're all platform-specific, so they don't prevent me from finding cross-platform bugs.

  • Bug 610311 makes it difficult to distinguish crashes on Linux, so I ignore crashes there.
  • Bug 612093 makes it difficult to distinguish PR_Asserts and abnormal exits on Windows. (It's fixed in NSPR and needs to be merged to mozilla-central.)
  • Bug 507876 makes it difficult to distinguish too-much-recursion crashes on Mac. (But I don't currently know of any, so I'm not ignoring them at the moment!)

Detecting leak-until-shutdown bugs

November 14th, 2010

Most of Mozilla's leak-detection tools work on the premise that when the application exits, no objects should remain. This strategy finds many types of leak bugs: I've used tools such as trace-refcnt to find over a hundred. But it misses bugs where an object lives longer than it should.

The worst of these are bugs where an object lives until shutdown, but is destroyed during shutdown. These leaks affect users as much as any other leak, but most of our tools don't detect them.

After reading about an SVG leak-until-shutdown bug that the traditional tools missed, I wondered if I could find more bugs of that type.

A new detector

I started with the premise that if I close all my browser windows (but open a new one so Firefox doesn't exit), the number of objects held alive should not depend on what I did in the other windows. I retrofitted my DOM fuzzer with a special exit sequence:

  1. Open a new, empty window
  2. Close all other windows
  3. Until memory use stabilizes
  4. Count the remaining objects (should be constant)
  5. Continue with the normal shutdown sequence
  6. Count the remaining objects (should be 0)

If the first count of remaining objects depends on what I did earlier in the session, and the second count is 0, I've probably found a leak-until-shutdown bug.

To reduce noise, I had to disable the XUL cache and restrict the counting to GlobalWindow and nsDocument objects. On Linux, I normally count 4 nsGlobalWindows and 4 nsDocuments.

So far, I've found two bugs where additional objects remain:

I'm glad we found the <video> leak before shipping Firefox 4!

Note that this tool can't find all types of leaks. It won't catch leak-until-page-close bugs or other leaks with relatively short lifetimes. It can't tell you if a cache is misbehaving or if cycle collection isn't being run often enough.

Next steps

Depending on how promising we think this approach is, we could:

  • Use it in more types of testing
    • Package it into a more user-friendly extension for Firefox debug builds
    • Make it a regular part of fuzzing
    • Use it for regression tests
  • Add something to Gecko that's similar but less kludgy
  • Expand the classes it will complain about
  • Debug the flakiness with smaller objects
  • Make the XUL cache respond to memory-pressure notifications

It's also possible that DEBUG_CC, and in particular its "expected to be garbage" feature, will prove itself able to find a superset of leaks that my tool can find.