Archive for the 'Fuzzing' Category

Fuzzing for consistent rendering

Saturday, March 3rd, 2012

My DOM fuzzer can now find bugs where the layout of a DOM tree depends on its history.

In this example, forcing a re-layout swapped a “1” and ”3” on the screen. My fuzzer didn’t know which rendering was correct, but it could tell that Firefox was being inconsistent.

Initial DOM tree
  • DIV
    • ت
    • SPAN
      • 1
      • SPAN
      • 3
31ت
Random change:
remove the inner span
  • DIV
    • ت
    • SPAN
      • 1
      • 3
31ت
Force re-layout
  • DIV
    • ت
    • SPAN
      • 1
      • 3
13ت

Gecko developer Simon Montagu quickly determined that 13ت is the correct rendering and attached a patch. Later, when a user reported that the bug affected Persian comments on Facebook, we were able to backport Simon’s fix to Firefox 11.

How it works

The fuzzer starts by making random dynamic changes to a page. Then it compares two snapshots: one taken immediately after the dynamic changes, and another taken after also forcing a relayout.

To force a relayout, it removes the root from the document and then adds it back:

  var r = document.documentElement; 
  document.removeChild(r);
  document.appendChild(r);

Like reftest, it uses drawWindow() to take snapshots and compareCanvases() to compare them.

In theory, I could also look for bugs where dynamic changes do not repaint enough of the window. But I've been told that testing for painting invalidation bugs is tricky, so I'll wait until most of the layout bugs are fixed.

Exceptions

Since the testcases are random, I have to be heavy-handed in ignoring known bugs. If I file a rendering bug where the weirdest part of the testcase is floats, I'll have the fuzzer ignore inconsistent rendering in testcases with floats until the bug is fixed.

The current list of exceptions is fairly large and includes key web technologies:

Fuzzing in the pool

Tuesday, November 23rd, 2010

In mid-2009, John O'Duinn offered to let my DOM fuzzer run on the same pool of machines as Firefox regression tests. I'd have an average of 20 computers running my fuzzer across a range of operating systems, and I wouldn't have to maintain the computers. All I had to do was tweak my script to play nicely with the scheduler, and not destroy the machines.

Playing nicely with the scheduler

Counter-intuitively, to maximize the amount of fuzzing, I had to minimize the duration of each fuzz job. The scheduler tries to avoid delays in the regression test jobs so developers don't go insane watching the tree. A low-priority job will be allowed to start much more often if it only takes 30 minutes.

Being limited to 30 minutes means the fuzz jobs don't have time to compile Firefox. Instead, fuzz jobs have to download Tinderbox builds like the regression test jobs do. I fixed several bugs in mozilla-central to make Tinderbox builds work for fuzzing.

I also modified the testcase reducer to split its work into 30-minute jobs. If the fuzzer finds a bug and the reducer takes longer than 30 minutes, it uploads the partially-reduced testcase, along with the reduction algorithm's state, for a subsequent job to continue reducing. To avoid race conditions between uploading and downloading, I use "ssh mv" synchronization.

Not destroying the test slaves

I wasn't trying to fill up the disks on the test slaves, really!

Early versions of my script filled up /tmp. I had incorrectly assumed that /tmp would be cleared on each reboot. Luckily, Nagios caught this before it caused serious damage.

Due to a security bug in some debug builds of Firefox, the fuzzer created randomly-named files in the home directory. This security bug has been fixed, but I'm afraid RelEng will be finding files named "undefined" and "[Object HTMLBodyElement]" for a while.

By restarting Firefox frequently, fuzzing accelerated the creation of gigantic console.log files on the slaves. We're trying to figure out whether to make debug-Firefox not create these files or make BuildBot delete them.

Results so far

Running in the test pool gets me a variety of operating systems. The fuzzer currently runs on Mac32 (10.5), Mac64 (10.6), Linux32, Linux64, and Win32. This allowed me to find a 64-bit-only bug and a Linux-only bug in October. Previously, I had mostly been testing on Mac.

The extra computational power also makes a difference. I can find regressions more quickly (which developers appreciate) and find harder-to-trigger bugs (which developers don't appreciate quite as much). I also get faster results when I change the fuzzer, such as the two convoluted testcases I got shortly after I added document.write fuzzing.

Unexpectedly, getting quick results from fuzzer changes makes me more inclined to tweak and improve to the fuzzer. I know that the change will still be fresh in my mind when I learn about its effects. This may turn out to be the most important win.

With cross-platform testing and the boost to agility, I suddenly feel a lot closer to being able to share and release the fuzzer.

How my DOM fuzzer ignores known bugs

Sunday, November 21st, 2010

When my DOM fuzzer finds a new bug, I want it to make a reduced testcase and notify me so I can file a bug report. To keep it from wasting time finding duplicates of known bugs, I maintain several ignore lists:

Some bugs are harder to distinguish based on output. In those cases, I use suppressions based on the fuzzer-generated input to Firefox:

Fixing any bug on those lists improves the fuzzer's ability to find additional bugs. But I'd like to point out a few that I'd especially like fixed:

In rare cases, I'll temporarily tell the fuzzer to skip a feature entirely:

Several bugs interfere with my ability to distinguish bugs. Luckily, they're all platform-specific, so they don't prevent me from finding cross-platform bugs.

  • Bug 610311 makes it difficult to distinguish crashes on Linux, so I ignore crashes there.
  • Bug 612093 makes it difficult to distinguish PR_Asserts and abnormal exits on Windows. (It's fixed in NSPR and needs to be merged to mozilla-central.)
  • Bug 507876 makes it difficult to distinguish too-much-recursion crashes on Mac. (But I don't currently know of any, so I'm not ignoring them at the moment!)

GCC correctness fuzzing

Wednesday, November 3rd, 2010

In 2008 I wrote about generating random JavaScript to find differences between optimization modes and differences between JavaScript engines (rough list of bugs).

How do you do this kind of testing on a language like C where the behavior of many programs is undefined per spec? John Regehr explains how in his talk Exposing Difficult Compilers Bugs With Random Testing at GCC Summit 2010.

Fuzzing talk at the Mozilla Summit

Wednesday, July 14th, 2010

At the 2010 Mozilla Summit, I talked about my JavaScript engine and DOM fuzzers, which have each found many hundreds of bugs. I also talked about the automations that keep me sane when I fuzz these complex components.

My slides are in the S5 web-based presentation format. You can click the Ø button to view the presentation in "handout mode" and see what I planned to say while each slide was up.

I shared a presentation slot with Mozilla contractor Paul Nickerson, who has a separate slide deck. He wisely saved the best part of his talk for the end: a demo of his font fuzzer causing Windows 7 to blue-screen.

CSS grammar fuzzer

Monday, March 16th, 2009

I wrote a CSS grammar fuzzer to test Gecko's CSS parser. This fuzzer's tricks:

Declarative context-free grammar. This makes it easy to add new CSS features to the fuzzer, or even use it to test grammars other than CSS. Each symbol can be a star, concatenation, or list of alternatives. Unlike a parser, a fuzzer has to make decisions about what to create, so alternatives can be given weights and stars have a suggested number of repetitions. This alone was enough to find at least one bug in Gecko.

Breaking rules. Like any good fuzzer, it doesn't always follow the given context-free grammar. Sometimes it does weird stuff, such as inserting a random symbol, to throw the parser off. I was surprised that this only found one additional bug in Gecko. Perhaps this reflects the comprehensive error handling requirements of the CSS specifications and the corresponding test suites.

Grammatical recursion. When the fuzzer notices that a symbol is the same as an ancestor, it can repeat the parts of the final string between the two symbols. This is effective at finding bugs where large input can cause a recursive algorithm to run out of stack space and crash. This found four grammar-recursion crashes in Gecko.

CSS serialization. The fuzzer makes sure that any text that comes out of the browser's stylesheet serializer survives another trip through the parser and serializer. This is the same trick jsfunfuzz uses to test the JavaScript engine decompiler. This helped fine four incorrect-serialization bugs in Gecko.

None of these Gecko bugs seem to be security holes. I shared the fuzzer with other browser vendors privately for over a month, and nobody asked me to delay the release, so I believe it didn't find security holes in other browsers either. But I think this has more to do with CSS parsing being fairly simple and self-contained than any weakness in the fuzzer.

After CanSecWest, I'm going to try using this fuzzer to generate JavaScript expressions that crash parsers that use recursion incautiously. Unless someone beats me to it, of course ;) (jsfunfuzz already creates many types of weird JavaScript, but can't look for grammatical recursion opportunities easily because it is written in a functional style rather than a declarative style.)

This fuzzer is written in JavaScript and is MIT licensed. I'd love to hear what other people manage to do with it. Get it here.

Some differences between JavaScript engines

Tuesday, December 23rd, 2008

I gave my new fuzzer a break from testing TraceMonkey by asking it to look for differences between SpiderMonkey and JavaScriptCore. I have listed them below, with SpiderMonkey output above JavaScriptCore output.

I have no idea how many of these are bugs (in SpiderMonkey or JavaScriptCore) and how many are ambiguous in the spec (intentionally or unintentionally).

Early error reporting

SpiderMonkey reports some errors at compile time that JavaScriptCore only reports at run time, if the code is actually hit. The difference is most obvious (and most likely to cause compatibility problems) if the code is skipped.

> if (false) { --1; }
S: SyntaxError: invalid decrement operand
J: (no error)
> if (false) { return; }
S: SyntaxError: return not in function
J: (no error)

instanceof

The two engines disagree about what objects are reasonable operands for the 'instanceof' operator.

> ({} instanceof {a:2})
S: typein:3: TypeError: invalid 'instanceof' operand ({a:2})
J: false
> ({} instanceof eval)
S: false
J: Exception: TypeError: instanceof called on an object with an invalid prototype property.

new with native functions

SpiderMonkey allows the "new" operator to be used with some native functions that JavaScriptCore considers non-constructors.

> new Math.sqrt(16)
S: 4
J: Exception: TypeError: Result of expression 'Math.sqrt' ... is not a constructor.
> new ({}.toString)
S: [object Object]
J: Exception: TypeError: Result of expression '({}.toString)' ... is not a constructor.
> new eval
S: typein:9: EvalError: function eval must be called directly, and not by way of a function of another name
J: Exception: TypeError: Result of expression 'eval' ... is not a constructor.

Converting between numbers and strings

> print(+'\00000027')
S: NaN
J: 0
> (1e-10).toString(16)
S: 0.000000006df37f675ef6ec
J: 0

const

There are subtle differences in handling of this new keyword.

> const d; const d;
S: TypeError: redeclaration of const d
J: (no error)
> const c = 0; print(++c);
S: 0
J: 1

Other differences

> print((function(){return arguments;})());
S: [object Object]
J: [object Arguments]
> typeof /x/
S: object
J: function

See Mozilla bug 61911, which changed this in SpiderMonkey in 2007.

Fuzzing TraceMonkey

Tuesday, December 23rd, 2008

Making JavaScript faster is important for the future of computer security. Faster scripts will allow computationally intensive applications to move to the Web. As messy as the Web's security model is, it beats the most popular alternative, which is to give hundreds of native applications access to your files. Faster scripts will also allow large parts of Firefox to be written in JavaScript, a memory-safe programming language, rather than C++, a statically typed footgun.

Mozilla's ambitious TraceMonkey project adds a just-in-time compiler to Firefox's JavaScript engine, making many scripts 3 to 30 times faster. TraceMonkey takes a non-traditional approach to JIT compilation: instead of compiling a function at a time, it compiles only a path (such as the body of a loop) at a time. This makes it possible to optimize the native code based on the actual type of each variable, which is important for dynamic languages like JavaScript.

My existing JavaScript fuzzer, jsfunfuzz, found a decent number of crash and assertion bugs in early versions of TraceMonkey. I made several changes to jsfunfuzz to help it generate code to test the JIT infrastructure heavily. For example, it now generates mixed-type arrays in order to test how the JIT deals with unexpected type changes.

Andreas Gal commented that each fuzz-generated testcase saved him nearly a day of debugging: otherwise, he'd probably have to tease a testcase out of a misbehaving complex web page. Encouraged by his comment, I looked for additional ways to help the TraceMonkey team.

JIT correctness

Last month, I wrote a new fuzzer designed to find correctness bugs. It runs a randomly-generated script in two JavaScript engines (in this case, SpiderMonkey with and without the JIT) and complains if the output is different.

It quickly found 13 bugs where the JIT caused JavaScript code to produce incorrect results. These bugs range from obvious to obscure to evil.

It even found two security bugs that jsfunfuzz had missed. One was a crash that involved a combination of language features that jsfunfuzz doesn't test heavily. The other was an uninitialized-memory-read bug, which caused the output to be random when it should have been consistent. jsfunfuzz missed the bug because it ignores most output, but the new fuzzer interpreted it as a difference between non-JIT and JIT output and brought the bug to my attention.

JIT speed

I set up the new fuzzer to compare the time needed to execute scripts and complain whenever enabling the JIT made a script run more slowly. It measures speed by letting the script run for 500ms and reporting the number of loop iterations completed in that time.

So far, it has found 4 serious bugs where the JIT makes scripts several times slower. Two of these have already been fixed, but the other two may be difficult to fix.

It has also found 10 cases where the JIT makes scripts about 10% slower. Most of these minor slowdowns are due to "trace aborts", where a piece of JavaScript is not converted to native code and stays in the interpreter. Some trace aborts are due to bugs, while others are design decisions or cases for which conversion to native code simply hasn't been implemented yet.

There is some disagreement over which trace aborts are most likely to affect real web pages. I asked members of Mozilla's QA team to scan the web in a way that can answer this question.

Interpreter speed

Mostly for fun, I also looked to see which code the JIT speeds up the most. Here's a simplified version of its answer:

for (var i = 0; i < 0x02000000; ++i) {
  d = 0x55555555;
  d++; d++; d++; d++; d++;
}

This code runs 250 times faster when the JIT is enabled. The JIT is able to achieve this gigantic speedup due to the interpreter being inefficient in dealing with undeclared variables and numbers that can't be represented as 30-bit ints.

Assertions

The JavaScript engine team has documented many of their assumptions as assertions in the code. Many of these assertions make it easier to spot dangerous bugs, because the script generated by the fuzzer doesn't have to be clever enough to actually cause a crash, only strange enough to violate an assumption. This is similar to my experience with other parts of Gecko that use assertions well.

Other JavaScript engine assertions make it easier to find severe performance bugs. Without these assertions, I'd only find these bugs when I measure speed directly, which requires drastically slowing down the tests.

More ideas

One testcase generated by my fuzzer demonstrated a combination of a JIT performance bug with a minor bytecode generation bug. I might be able to search for similar bytecode generation bugs the same way I searched for decompiler bugs: by ensuring that a function does not change when round-tripping through the decompiler. In order to do that, I'll need a new patch for making dis() return the disassembly instead of printing it.

I should be able to find some performance bugs by looking at which aborts and side exits are taken. This strategy would make some performance bugs (such as repeatedly taking a side exit) easier to spot.