Archive for the 'Mozilla' Category

Fuzzers love assertions

Monday, February 3rd, 2014

Fuzzers make things go wrong.
Assertions make sure we find out.

Assertions can improve code quality in many ways, but they truly shine when combined with fuzzing. Fuzzing is normally limited to finding obvious symptoms like crashes, because it's rare to be able to tell correct behavior from incorrect behavior when the input is generated randomly. Assertions expand the scope of fuzzing to include everything they check.

Assertions can even help find crash bugs: some bugs are relatively easy for fuzzers to trigger, but only lead to crashes when additional conditions are met. A well-placed assertion can let us know every time we trigger the bug.

Fuzzing JS and DOM has found about 4000 assertion bugs, including about 300 security bugs.

Asserting safe use of generic data structures

Assertions in widely-used data structures can find bugs in many callers.

  • Array indices must be within bounds. This simple precondition assert in nsTArray has caught about 90 bugs.
  • Hash tables must not be modified during enumeration. If the modification happened to resize the hash table, it would leave stack pointers dangling. This PLDHashTable assertion has caught over 50 bugs.
  • Cached values should not be out of date. When a cache's get method takes a key and a closure for computing values in the case of a cache miss, debug builds can check whether the cached values are still correct. This is effectively a form of differential testing that notices bugs in cache-invalidation logic.

Asserting module invariants

When an entire module must maintain an invariant, a single assertion can catch dozens of bugs.

Making the frame arena safer

Gecko's CSS box objects, called "frames", are created and destroyed manually. They are allocated within an arena to reduce malloc overhead and fragmentation. The arena also made it possible to reduce the risk associated with manual memory management. A combination of assertions (in debug builds) and runtime mitigations (in all builds) mitigates dangling pointer bugs that involve frames.

Requests for Gecko developers

Please add assertions, especially when:

  • A bug would be a security hole
  • Crashing is not guaranteed
  • Many callers must fulfill a precondition
  • Complex, extensive code must maintain an invariant

Also consider:

Customizing the Mozilla Manifesto

Tuesday, December 17th, 2013

I have mixed feelings about requiring Mozillians to “agree” to the Mozilla Manifesto. I get the impression that many volunteers aren’t fond of “commercial involvement” (9). Firefox development often does not live up to the ideals of absolute security (4) or transparency (8), so we’d be asking new contributors to commit to behavior for which they may have little support.

Meanwhile, the manifesto is oddly silent on two issues that many Mozillians care about deeply. First, it says little about privacy. “Shaping your own experience on the Internet” (5) suggests control over customized ads, but not control over tracking by advertisers or governments.

Second, the manifesto does not adequately address removing barriers to contribution or promoting inclusiveness in community processes. The relevant principles (6, 8) are worded as vague beliefs rather than strong values. Compare with my favorite part of the Ada Initiative FAQ:

“Open technology and culture are shaping the future of global society. If we want that society to be socially just and to serve the interests of all people, [all kinds of people] must be involved in its creation and organization.”

Rather than asking each Mozillian to agree to the entire manifesto, let’s instead encourage everyone to Likert the 10 existing principles and add a few of their own.

Indicating how you feel about each principle is more memorable than clicking “Agree” once. Each Mozillian would have a personal version of the Manifesto to remind them what drives them to contribute. Such a survey could also lead to better understanding of the community and suggest improvements to the Manifesto.

Fuzzing for consistent rendering

Saturday, March 3rd, 2012

My DOM fuzzer can now find bugs where the layout of a DOM tree depends on its history.

In this example, forcing a re-layout swapped a “1” and “3” on the screen. My fuzzer didn’t know which rendering was correct, but it could tell that Firefox was being inconsistent.

Initial DOM tree
  • DIV
    • ت
    • SPAN
      • 1
      • SPAN
      • 3
31ت
Random change:
remove the inner span
  • DIV
    • ت
    • SPAN
      • 1
      • 3
31ت
Force re-layout
  • DIV
    • ت
    • SPAN
      • 1
      • 3
13ت

Gecko developer Simon Montagu quickly determined that 13ت is the correct rendering and attached a patch. Later, when a user reported that the bug affected Persian comments on Facebook, we were able to backport Simon’s fix to Firefox 11.

How it works

The fuzzer starts by making random dynamic changes to a page. Then it compares two snapshots: one taken immediately after the dynamic changes, and another taken after also forcing a relayout.

To force a relayout, it removes the root from the document and then adds it back:

  var r = document.documentElement; 
  document.removeChild(r);
  document.appendChild(r);

Like reftest, it uses drawWindow() to take snapshots and compareCanvases() to compare them.

In theory, I could also look for bugs where dynamic changes do not repaint enough of the window. But I've been told that testing for painting invalidation bugs is tricky, so I'll wait until most of the layout bugs are fixed.

Exceptions

Since the testcases are random, I have to be heavy-handed in ignoring known bugs. If I file a rendering bug where the weirdest part of the testcase is floats, I'll have the fuzzer ignore inconsistent rendering in testcases with floats until the bug is fixed.

The current list of exceptions is fairly large and includes key web technologies:

Lessons from JS engine bugs

Thursday, September 1st, 2011

Last week, I asked Luke Wagner to explain some security bugs that he fixed in the past. I hoped to learn from each bug at multiple levels, in ways that could help prevent future security bugs from arising and persisting.

Luke is one of the developers working on Firefox's JavaScript engine, which is currently our largest source of critical security bugs.

Method

I imagined we would recurse in exhaustive breadth and exhausting depth. Instead, we recursed only on the most interesting items, and refined a checklist of starting points:

  • What was the bug?
  • What went wrong in the developer's thinking that caused the bug to be introduced?
  • What made the bug exploitable?
  • What caused us to use especially dangerous features of C++?
  • Could a new abstraction make it possible to do this both fast and safe?
  • What caused the bug to persist? Could we have caught this earlier with improved regression tests, fuzz testing, dynamic analysis, or static analysis?

Luke and I made trees for all ten bugs, at first on paper and later using EtherPad. Then I extracted and categorized what I thought were the most useful lessons and recommendations.

Recommendations for introducing fewer bugs

Casts

  • Create centralized, type-restricted cast functions. This protects you when you change the representation of one of the types. It also protects against mistakes that cause the input type to be incorrect.

Sentinel values

  • Use tagged unions instead.
  • Use a typed wrapper (a struct containing a single value). When assigning from the underlying numeric type, convert using one of two functions: one that checks for special values, and one that explicitly does not.
  • Audit existing code paths to ensure they cannot generate the special value.

Clarity of invariants

Interacting with other developers

  • If you're about to do something gross because someone else doesn't expose the right API/helper, maybe you should get it exposed.

JS Engine specific

  • Any patch that touches rooting should be reviewed by Igor.
  • Interpreter could have better abstraction and encapsulation for its stack.

Recommendations for catching bugs earlier

Static analysis

  • Find all casts (C-style casts, the reinterpret_cast keyword, and casts through unions) for a given type. Could be used to enforce centralization or to find things that should be centralized.
  • Be suspicious of a function with multiple return statements, all of which return the same primitive value.
  • Be suspicious of a function returning true/success in an OOM path.

Dynamic analysis

  • Ask Valgrind developers what they think of providing (in valgrind.h) a way to tie the addressability of "stacklike memory" to a variable that represents the end of the stack.

Fuzzing

  • We should fuzz worker threads somehow.
    • In browser (slow and messy, but it's what users are running).
    • In thread-safe shell (--enable-threadsafe?), which has "toy workers".
  • We should fuzz compartments better.
    • I should ask Blake and Andreas for help with testing compartments and wrappers.
    • I should ask Gary to run jsfunfuzz in xpcshell, where I can test both same-origin and different-origin compartments, and thus get more interesting wrappers.
  • We should give JS OOM fuzzing another shot.

Next steps

I'm curious if others have additional ideas for what could have prevented the ten bugs we looked at. For example, someone like Jeff Walden, who loves to write exhaustive regression tests, might have ideas that Luke and I did not consider.

I'd also like to do this kind of analysis with a other developers on bugs they have fixed.

On the Isle of Rapidity

Saturday, August 27th, 2011

Not all of our neighbors followed us. Some askeddemanded? — that we send back supplies.

We acknowledged their request, but our immediate task was to explore this Isle of Rapidity. What surprises would we discover? What surprises would discover us?

To survive in this strange land, we would have to befriend new neighbors. Living for so long atop Mount Annum, we had almost forgotten how to introduce ourselves.

But we had brought much to share. We had barely opened our packs when the wind seemed to whisper:

Here, gifts arrive almost before you send them.

Maybe it wouldn’t be so hard to make friends here.

And there was something inexplicably familiar about this island. Was it the scent of the flowers? The rhythmic waves in the distance? The chattering of wildlife, almost a chorus?

Here, a gift to your neighbor is equally a gift to yourself.

We felt a sudden shift in perception: the Isle of Rapidity was home.

Venturing from Mount Annum

Saturday, August 27th, 2011

Our friends thought us mad.

We had thrived atop Mount Annum. It was the highest peak as far as the eye could see.

Once, we had sprained an ankle on the Foothills of Many Betas in the west. We remembered miserable visits to the Bog of Eternal Driver Approval in the east. Every time, we had quickly returned home.

But there we were, on the summit of Mount Annum, climbing into a cannon aimed at 4°N, 6°W.

So far away, and yet oddly specific. We lit the fuse and plugged our ears.

We flew over the stifling Bog of Eternal Driver Approval. We flew over the perilous Sea of Recklessness.

We landed, as we hoped, on the uncharted Isle of Rapidity. The landing was unexpectedly soft.

We’ve shaken off the dust and gunpowder. We’ve begun to tend our wounds. We’re excited about the upcoming climb.

Rapid releases and crashes

Saturday, August 27th, 2011

In the months before Firefox's first rapid release, one concern echoed throughout engineering: crashes.

We had always relied on long stabilization periods to get crash rates down. Firefox 4 would be our last high-stability release. We hoped improvements on other aspects of quality would outweigh the decreased stability.

But then something surprising happened. We released Firefox 5, and Firefox didn’t get crashier.

VersionCrashes per 100 active daily users
3.6.201.8
4.0.11.6
5.0.11.4
6.01.6

KaiRo’s explanation parallels what I’ve seen helping with MemShrink:

  • The channel cascade gives each release 12 weeks of pure stabilization.
  • The channel audiences help by comparing alphas to alphas.
  • The short cycles enable backouts and reduce the desire to land half-baked features.

“Rapid release” doesn't mean building Firefox the way we always have, x times faster. It’s a new process that fits together in beautiful yet fragile ways.

Improving intranet compatibility

Thursday, August 25th, 2011

Some organizations are reluctant to keep their browsers up-to-date because they worry that internal websites might not be compatible.

Organization-internal sites can have unusual compatibility constraints. Many have small numbers of users, yet are highly sensitive to downtime. Some were developed with the assumption that the web would always be as static as it was in 2003.

Rapid releases help in some ways: fewer things change at a time; we can deprecate APIs before removing them; and the permanent Aurora and Beta audiences help test each new release consistently.

But frequent releases make manual testing impractical. (Let's pretend for now that the roughly-monthly security "dot releases" never broke anything.)

As with the problem of extension compatibility, overlapping releases could be part of a solution. But we should start by thinking about ways to attack compatibility problems directly.

Automated testing

A tool could scan for the use of deprecated web features, such as the “-moz-” prefixed version of border-radius. This tool, similar in spirit to the AMO validator, could be run on the website's source code or on the streams received by Firefox.

There is already a compatibility detector for HTML issues, but my intuition is that CSS and DOM compatibility problems are more common.

User testing

Not all visitors have the technical skills and motivation to report issues they encounter. In some organizations, bureaucracy can stifle communication between visitors and developers. Automating error reports could help.

It would be cool if a Firefox extension could report warnings and errors from internal sites to a central server.

Depending on the privacy findings from this extension, it could become an onerror++ API available to all web sites, similar to the Web Performance APIs. This seems more sensible than adding API-specific error reporting.

Sample contracts

We could suggest things to put in contracts for outsourced intranet development.

Often, the best solution is to align incentives. That could take the form of specifying that the developers are responsible for maintenance costs for a specified length of time.

When that isn't practical, I'd suggest specifying requirements such as:

Channel management

There is a roughly exponential distribution of home users between the Nightly, Aurora, Beta, and Release channels. This helps Mozilla and public web sites fix incompatibilities before they affect large numbers of users.

Large organizations should strive for a similar channel distribution so that internal websites benefit in the same way. It might make sense for Mozilla to provide tools to help, perhaps based on our own update service or Build Your Own Browser.

Counterintuitively, the best strategy for security-conscious organizations may be to standardize on the Beta channel, with the option to downgrade to Release in an emergency. This isn't as crazy as it sounds. Today's betas are as stable as yesteryear's release candidates, thanks to the Aurora audience and the discipline made possible by the 6-week cadence. And since Beta gets security fixes sooner, they are safer in some ways.

The loss of release overlap takes away some options from IT admins and intranet developers, but rapid releases also make possible new strategies that could be better in the long term.