Archive for the 'Mozilla' Category

Improving incentives for web advertisers

Wednesday, August 10th, 2011

When users install ad filters out of a desire to avoid unpleasant ads, they usually end up blocking all ads. This does little to incentivize individual sites to clean up their ads, because from the perspective of any web site owner, users decide whether to block ads mostly based on their experiences on other sites.

As more users turn to ad filters, debates about ad filtering are becoming increasingly polarized. A pro-ad faction screams “blocking ads is like stealing from web sites”. An anti-ad faction screams “showing me ads is like kicking me in the groin in the hope that a penny will fly out of my pocket”.

A better way?

What if a future version of Adblock Plus only tried to block bad ads by default? The immediate result would be negligible, because most ad networks today are somewhere between bad and terrible. But some users would feel more comfortable enabling the blocks, and web site owners would have a harder time blaming visitors for missed revenue.


Current Adblock Plus first run page

Ad filter set:
e.g. animations, sounds, interstitials
[?]e.g. plugins, scripts that block parsing
[?]
[?]
Proposed options and defaults

“Block distracting ads” would block ads that animate, ads with bright pink, gigantic ads, <audio> ads, ads that use absolute positioning to cover other content, and plugins.

“Block slow ads” would block ad scripts that do not use async or defer, any ad that uses more than 5 sequential or 10 total requests, any ad content that hasn't finished loading after 500ms, and plugins.

Note that these are all things that can be detected by the client, which already has a filter set that distinguishes ads from non-ads. Upon blocking an ad through one of these heuristics, the entire ad network should be blocked for a period of time, so that Firefox does not waste time downloading things it will not display. The filter set could also specify ad networks known to specialize in distracting ads, or ad networks that are slow in ways the heuristics miss.

Blocking manipulation

The heuristics above would block ads that interfere directly with your web experience, but what about ads that harm you in slightly subtler ways? Maybe activists would be inspired to curate subsets of existing filters, focusing on their causes:

Ad filter set:
[?]e.g. non-evidence-based medicine, “Free*”
[?]e.g. appeals to feelings of inadequacy
[?]e.g. sexual puns, gratuitous cleavage
Ad filter set:
e.g. lingerie, “adult dating”
e.g. action films, appeals to fear
e.g. junk food, tobacco
[?]e.g. toys, nag coaching

These would block almost every ad network today, assuming the curators err on the side of over-blocking when a network carries multiple types of ads. I can only name one ad network that demonstrates the slightest bit of competence at keeping out scams and one ad network that actively gathers feedback from viewers about individual ads.

With improved incentives, more ad networks would try to do the right thing.

I look forward to a future where advertising is a truly low-transaction-cost way to compensate free content providers.

I look forward to innovators and creators once again having a way to connect with people who might genuinely stand to benefit from their work, without having their voices drowned out by screaming scammers.

More skimmable diffs on hgweb

Sunday, June 19th, 2011

I've found it hard to skim diffs on hgweb (example). The lack of visual hierarchy meant I often didn't notice the jumps between files, even when I was looking for them.

I wrote a user stylesheet to fix this. Between files, it adds a gray bar, vertical space, and outdented filenames. Between sections of a file, it adds a small amount of vertical space. (screenshot before, screenshot after)

To install hgweb-visual-hierarchy.css, add it to chrome/userContent.css in your Firefox profile directory. You may need to create the subdirectory and file.

Tracking after-fix tasks

Friday, June 10th, 2011

I often come across a bug report and decide that I want to do something once it's fixed:

  • Once bug X is fixed, tweet joyfully.
  • Once bug X is fixed, update some documentation.
  • Once bug X is fixed, add a feature to a fuzzer.
  • Once bug X is fixed, retest a fuzz testcase that I assumed triggered bug X but might in fact trigger a different bug.
  • Once bug X is fixed, add a regression test for bug W.
  • Once bug X is fixed, see if it also fixed bug Y.

I could CC myself to the bug, but then I'll get lots of email and might forget my reason for being CCed. I could create a dependency, but that sends everyone else confusing bugspam and gives me notifications that are easy to miss.

The after-fix tool

I created a tool called after-fix to track these bug-dependent tasks for me. I have a large after-fix config file with entries like:

# Make use of new GC APIs in DOM fuzzer bug 661469

I'll see my note the next time I run after-fix after bug 661469 is fixed.

Pruning workarounds

I've also been using after-fix to ensure workarounds don't outlive their purpose:

Faster crash analysis with stack-blame

Wednesday, June 1st, 2011

When a new crash appears, we often want to know whether any code on the stack changed recently. Historically, this has required opening each source link in a new tab and waiting for hgweb to generate megabytes of HTML. Even if you only look at the top 10 frames of each stack, this gets boring quickly.

I created stack-blame to make this easier. It shows 5 lines of context around each line of the stack trace, and highlights fresh lines in green. In the stack-blame output for this crash report, it's easy to see that of the cairo functions on the stack, only _cairo_quartz_surface_finish was touched recently.

Fuzzing in the pool

Tuesday, November 23rd, 2010

In mid-2009, John O'Duinn offered to let my DOM fuzzer run on the same pool of machines as Firefox regression tests. I'd have an average of 20 computers running my fuzzer across a range of operating systems, and I wouldn't have to maintain the computers. All I had to do was tweak my script to play nicely with the scheduler, and not destroy the machines.

Playing nicely with the scheduler

Counter-intuitively, to maximize the amount of fuzzing, I had to minimize the duration of each fuzz job. The scheduler tries to avoid delays in the regression test jobs so developers don't go insane watching the tree. A low-priority job will be allowed to start much more often if it only takes 30 minutes.

Being limited to 30 minutes means the fuzz jobs don't have time to compile Firefox. Instead, fuzz jobs have to download Tinderbox builds like the regression test jobs do. I fixed several bugs in mozilla-central to make Tinderbox builds work for fuzzing.

I also modified the testcase reducer to split its work into 30-minute jobs. If the fuzzer finds a bug and the reducer takes longer than 30 minutes, it uploads the partially-reduced testcase, along with the reduction algorithm's state, for a subsequent job to continue reducing. To avoid race conditions between uploading and downloading, I use "ssh mv" synchronization.

Not destroying the test slaves

I wasn't trying to fill up the disks on the test slaves, really!

Early versions of my script filled up /tmp. I had incorrectly assumed that /tmp would be cleared on each reboot. Luckily, Nagios caught this before it caused serious damage.

Due to a security bug in some debug builds of Firefox, the fuzzer created randomly-named files in the home directory. This security bug has been fixed, but I'm afraid RelEng will be finding files named "undefined" and "[Object HTMLBodyElement]" for a while.

By restarting Firefox frequently, fuzzing accelerated the creation of gigantic console.log files on the slaves. We're trying to figure out whether to make debug-Firefox not create these files or make BuildBot delete them.

Results so far

Running in the test pool gets me a variety of operating systems. The fuzzer currently runs on Mac32 (10.5), Mac64 (10.6), Linux32, Linux64, and Win32. This allowed me to find a 64-bit-only bug and a Linux-only bug in October. Previously, I had mostly been testing on Mac.

The extra computational power also makes a difference. I can find regressions more quickly (which developers appreciate) and find harder-to-trigger bugs (which developers don't appreciate quite as much). I also get faster results when I change the fuzzer, such as the two convoluted testcases I got shortly after I added document.write fuzzing.

Unexpectedly, getting quick results from fuzzer changes makes me more inclined to tweak and improve to the fuzzer. I know that the change will still be fresh in my mind when I learn about its effects. This may turn out to be the most important win.

With cross-platform testing and the boost to agility, I suddenly feel a lot closer to being able to share and release the fuzzer.

How my DOM fuzzer ignores known bugs

Sunday, November 21st, 2010

When my DOM fuzzer finds a new bug, I want it to make a reduced testcase and notify me so I can file a bug report. To keep it from wasting time finding duplicates of known bugs, I maintain several ignore lists:

Some bugs are harder to distinguish based on output. In those cases, I use suppressions based on the fuzzer-generated input to Firefox:

Fixing any bug on those lists improves the fuzzer's ability to find additional bugs. But I'd like to point out a few that I'd especially like fixed:

In rare cases, I'll temporarily tell the fuzzer to skip a feature entirely:

Several bugs interfere with my ability to distinguish bugs. Luckily, they're all platform-specific, so they don't prevent me from finding cross-platform bugs.

  • Bug 610311 makes it difficult to distinguish crashes on Linux, so I ignore crashes there.
  • Bug 612093 makes it difficult to distinguish PR_Asserts and abnormal exits on Windows. (It's fixed in NSPR and needs to be merged to mozilla-central.)
  • Bug 507876 makes it difficult to distinguish too-much-recursion crashes on Mac. (But I don't currently know of any, so I'm not ignoring them at the moment!)

Detecting leak-until-shutdown bugs

Sunday, November 14th, 2010

Most of Mozilla's leak-detection tools work on the premise that when the application exits, no objects should remain. This strategy finds many types of leak bugs: I've used tools such as trace-refcnt to find over a hundred. But it misses bugs where an object lives longer than it should.

The worst of these are bugs where an object lives until shutdown, but is destroyed during shutdown. These leaks affect users as much as any other leak, but most of our tools don't detect them.

After reading about an SVG leak-until-shutdown bug that the traditional tools missed, I wondered if I could find more bugs of that type.

A new detector

I started with the premise that if I close all my browser windows (but open a new one so Firefox doesn't exit), the number of objects held alive should not depend on what I did in the other windows. I retrofitted my DOM fuzzer with a special exit sequence:

  1. Open a new, empty window
  2. Close all other windows
  3. Until memory use stabilizes
  4. Count the remaining objects (should be constant)
  5. Continue with the normal shutdown sequence
  6. Count the remaining objects (should be 0)

If the first count of remaining objects depends on what I did earlier in the session, and the second count is 0, I've probably found a leak-until-shutdown bug.

To reduce noise, I had to disable the XUL cache and restrict the counting to GlobalWindow and nsDocument objects. On Linux, I normally count 4 nsGlobalWindows and 4 nsDocuments.

So far, I've found two bugs where additional objects remain:

I'm glad we found the <video> leak before shipping Firefox 4!

Note that this tool can't find all types of leaks. It won't catch leak-until-page-close bugs or other leaks with relatively short lifetimes. It can't tell you if a cache is misbehaving or if cycle collection isn't being run often enough.

Next steps

Depending on how promising we think this approach is, we could:

  • Use it in more types of testing
    • Package it into a more user-friendly extension for Firefox debug builds
    • Make it a regular part of fuzzing
    • Use it for regression tests
  • Add something to Gecko that's similar but less kludgy
  • Expand the classes it will complain about
  • Debug the flakiness with smaller objects
  • Make the XUL cache respond to memory-pressure notifications

It's also possible that DEBUG_CC, and in particular its "expected to be garbage" feature, will prove itself able to find a superset of leaks that my tool can find.

War on Orange update

Friday, September 17th, 2010

Clint Talbert organized a meeting today on the topic of the intermittent failures. It was well-attended by members of the Automation, Metrics, and Platform teams, but we forgot to invite the Firefox front-end team.

There was some discussion of culture and policy around intermittence. For example, David Baron promoted the idea of estimating regression ranges for intermittent failures, and backing out patches suspected of causing the failures. But most of the meeting focused on metrics and tools.

Joel Maher demonstrated Orange Factor, which calculates the average number of intermittent failures per push. It shows that the average number of oranges dropped from 5.5 in August to 4.5 in September.

Daniel Einspanjer is designing a database for storing information about Tinderbox failures. He wants to know the kinds of queries we will run so he can make the database efficient for common queries. Jeff Hammel, Jonathan Griffin, and Joel Maher will be working on a new dashboard with him.

Two key points were raised about the database. The first is that people querying "by date" are usually interested in the time of the push, not the time the test suite started running. There was some discussion of whether we need to take the branchiness of the commit DAG into account, or whether we can stick with the linearity of pushes to each central repository.

The second key point is that we don't consistently have one test result per test and push. We might have skipped the test suite because the infrastructure was overloaded, or because someone else pushed right away. Another failure (intermittent or not) might have broken the build or made an earlier test in the suite cause a crash. Contrariwise, we might have run a test suite multiple times for a single push in order to help track down intermittent failures! The database needs to capture this information in order to estimate regression ranges and failure frequencies accurately.

We also discussed the sources of existing data about failures: Tinderbox logs, "star" comments attached to the logs, and bug comments created by TBPLbot (example) when a bug number in a "star" comment matches a bug number that had been suggested based on its summary. Each source of data has its own types of noise and gaps.