Archive for the 'Differential testing' Category

Fuzzing for consistent rendering

Saturday, March 3rd, 2012

My DOM fuzzer can now find bugs where the layout of a DOM tree depends on its history.

In this example, forcing a re-layout swapped a “1” and “3” on the screen. My fuzzer didn’t know which rendering was correct, but it could tell that Firefox was being inconsistent.

Initial DOM tree
  • DIV
    • ت
    • SPAN
      • 1
      • SPAN
      • 3
31ت
Random change:
remove the inner span
  • DIV
    • ت
    • SPAN
      • 1
      • 3
31ت
Force re-layout
  • DIV
    • ت
    • SPAN
      • 1
      • 3
13ت

Gecko developer Simon Montagu quickly determined that 13ت is the correct rendering and attached a patch. Later, when a user reported that the bug affected Persian comments on Facebook, we were able to backport Simon’s fix to Firefox 11.

How it works

The fuzzer starts by making random dynamic changes to a page. Then it compares two snapshots: one taken immediately after the dynamic changes, and another taken after also forcing a relayout.

To force a relayout, it removes the root from the document and then adds it back:

  var r = document.documentElement; 
  document.removeChild(r);
  document.appendChild(r);

Like reftest, it uses drawWindow() to take snapshots and compareCanvases() to compare them.

In theory, I could also look for bugs where dynamic changes do not repaint enough of the window. But I've been told that testing for painting invalidation bugs is tricky, so I'll wait until most of the layout bugs are fixed.

Exceptions

Since the testcases are random, I have to be heavy-handed in ignoring known bugs. If I file a rendering bug where the weirdest part of the testcase is floats, I'll have the fuzzer ignore inconsistent rendering in testcases with floats until the bug is fixed.

The current list of exceptions is fairly large and includes key web technologies:

GCC correctness fuzzing

Wednesday, November 3rd, 2010

In 2008 I wrote about generating random JavaScript to find differences between optimization modes and differences between JavaScript engines (rough list of bugs).

How do you do this kind of testing on a language like C where the behavior of many programs is undefined per spec? John Regehr explains how in his talk Exposing Difficult Compilers Bugs With Random Testing at GCC Summit 2010.

Some differences between JavaScript engines

Tuesday, December 23rd, 2008

I gave my new fuzzer a break from testing TraceMonkey by asking it to look for differences between SpiderMonkey and JavaScriptCore. I have listed them below, with SpiderMonkey output above JavaScriptCore output.

I have no idea how many of these are bugs (in SpiderMonkey or JavaScriptCore) and how many are ambiguous in the spec (intentionally or unintentionally).

Early error reporting

SpiderMonkey reports some errors at compile time that JavaScriptCore only reports at run time, if the code is actually hit. The difference is most obvious (and most likely to cause compatibility problems) if the code is skipped.

> if (false) { --1; }
S: SyntaxError: invalid decrement operand
J: (no error)
> if (false) { return; }
S: SyntaxError: return not in function
J: (no error)

instanceof

The two engines disagree about what objects are reasonable operands for the 'instanceof' operator.

> ({} instanceof {a:2})
S: typein:3: TypeError: invalid 'instanceof' operand ({a:2})
J: false
> ({} instanceof eval)
S: false
J: Exception: TypeError: instanceof called on an object with an invalid prototype property.

new with native functions

SpiderMonkey allows the "new" operator to be used with some native functions that JavaScriptCore considers non-constructors.

> new Math.sqrt(16)
S: 4
J: Exception: TypeError: Result of expression 'Math.sqrt' ... is not a constructor.
> new ({}.toString)
S: [object Object]
J: Exception: TypeError: Result of expression '({}.toString)' ... is not a constructor.
> new eval
S: typein:9: EvalError: function eval must be called directly, and not by way of a function of another name
J: Exception: TypeError: Result of expression 'eval' ... is not a constructor.

Converting between numbers and strings

> print(+'\00000027')
S: NaN
J: 0
> (1e-10).toString(16)
S: 0.000000006df37f675ef6ec
J: 0

const

There are subtle differences in handling of this new keyword.

> const d; const d;
S: TypeError: redeclaration of const d
J: (no error)
> const c = 0; print(++c);
S: 0
J: 1

Other differences

> print((function(){return arguments;})());
S: [object Object]
J: [object Arguments]
> typeof /x/
S: object
J: function

See Mozilla bug 61911, which changed this in SpiderMonkey in 2007.

Fuzzing TraceMonkey

Tuesday, December 23rd, 2008

Making JavaScript faster is important for the future of computer security. Faster scripts will allow computationally intensive applications to move to the Web. As messy as the Web's security model is, it beats the most popular alternative, which is to give hundreds of native applications access to your files. Faster scripts will also allow large parts of Firefox to be written in JavaScript, a memory-safe programming language, rather than C++, a statically typed footgun.

Mozilla's ambitious TraceMonkey project adds a just-in-time compiler to Firefox's JavaScript engine, making many scripts 3 to 30 times faster. TraceMonkey takes a non-traditional approach to JIT compilation: instead of compiling a function at a time, it compiles only a path (such as the body of a loop) at a time. This makes it possible to optimize the native code based on the actual type of each variable, which is important for dynamic languages like JavaScript.

My existing JavaScript fuzzer, jsfunfuzz, found a decent number of crash and assertion bugs in early versions of TraceMonkey. I made several changes to jsfunfuzz to help it generate code to test the JIT infrastructure heavily. For example, it now generates mixed-type arrays in order to test how the JIT deals with unexpected type changes.

Andreas Gal commented that each fuzz-generated testcase saved him nearly a day of debugging: otherwise, he'd probably have to tease a testcase out of a misbehaving complex web page. Encouraged by his comment, I looked for additional ways to help the TraceMonkey team.

JIT correctness

Last month, I wrote a new fuzzer designed to find correctness bugs. It runs a randomly-generated script in two JavaScript engines (in this case, SpiderMonkey with and without the JIT) and complains if the output is different.

It quickly found 13 bugs where the JIT caused JavaScript code to produce incorrect results. These bugs range from obvious to obscure to evil.

It even found two security bugs that jsfunfuzz had missed. One was a crash that involved a combination of language features that jsfunfuzz doesn't test heavily. The other was an uninitialized-memory-read bug, which caused the output to be random when it should have been consistent. jsfunfuzz missed the bug because it ignores most output, but the new fuzzer interpreted it as a difference between non-JIT and JIT output and brought the bug to my attention.

JIT speed

I set up the new fuzzer to compare the time needed to execute scripts and complain whenever enabling the JIT made a script run more slowly. It measures speed by letting the script run for 500ms and reporting the number of loop iterations completed in that time.

So far, it has found 4 serious bugs where the JIT makes scripts several times slower. Two of these have already been fixed, but the other two may be difficult to fix.

It has also found 10 cases where the JIT makes scripts about 10% slower. Most of these minor slowdowns are due to "trace aborts", where a piece of JavaScript is not converted to native code and stays in the interpreter. Some trace aborts are due to bugs, while others are design decisions or cases for which conversion to native code simply hasn't been implemented yet.

There is some disagreement over which trace aborts are most likely to affect real web pages. I asked members of Mozilla's QA team to scan the web in a way that can answer this question.

Interpreter speed

Mostly for fun, I also looked to see which code the JIT speeds up the most. Here's a simplified version of its answer:

for (var i = 0; i < 0x02000000; ++i) {
  d = 0x55555555;
  d++; d++; d++; d++; d++;
}

This code runs 250 times faster when the JIT is enabled. The JIT is able to achieve this gigantic speedup due to the interpreter being inefficient in dealing with undeclared variables and numbers that can't be represented as 30-bit ints.

Assertions

The JavaScript engine team has documented many of their assumptions as assertions in the code. Many of these assertions make it easier to spot dangerous bugs, because the script generated by the fuzzer doesn't have to be clever enough to actually cause a crash, only strange enough to violate an assumption. This is similar to my experience with other parts of Gecko that use assertions well.

Other JavaScript engine assertions make it easier to find severe performance bugs. Without these assertions, I'd only find these bugs when I measure speed directly, which requires drastically slowing down the tests.

More ideas

One testcase generated by my fuzzer demonstrated a combination of a JIT performance bug with a minor bytecode generation bug. I might be able to search for similar bytecode generation bugs the same way I searched for decompiler bugs: by ensuring that a function does not change when round-tripping through the decompiler. In order to do that, I'll need a new patch for making dis() return the disassembly instead of printing it.

I should be able to find some performance bugs by looking at which aborts and side exits are taken. This strategy would make some performance bugs (such as repeatedly taking a side exit) easier to spot.

Fuzzing for JavaScript correctness

Thursday, August 2nd, 2007

Fuzz-testing is usually only used to find crashes and assertion failures, but my JavaScript engine fuzzer goes beyond catastrophic failures when it tests the decompiler. It checks the decompiled code for signs of incorrectness in two ways.

First, it checks that the decompiled code compiles without giving syntax errors. This finds fun bugs like bug 346904 where the decompiler screwed up in an understandable way, as well as bugs like bug 351496 where the decompilation is complete nonsense.

Second, it checks that the decompiled code is canonical -- compiling and decompiling again should give the exact same representation as the original decompilation. This helps find bugs like bug 381196 where decompilation changes the meaning of the code without introducing a syntax error.

Some decompilation changes, such as bug 352068, did not change the meaning of the code and simply reflected varying amounts of optimization in the compiler. Early in the fuzzer's life, I was able to convince Brendan that it was worth fixing many of those otherwise harmless "round-trip changes" in order to make it possible to find other bugs with this method.

This pair of checks doesn't find all decompiler bugs, of course, but it finds quite a few of them. jsfunfuzz has a few other correctness checks for things like unnecessary parentheses in decompiled code and bogus results from object uneval.

Can you think of other ways to use fuzz-testing to find "correctness" bugs?