Lessons from JS engine bugs
Last week, I asked Luke Wagner to explain some security bugs that he fixed in the past. I hoped to learn from each bug at multiple levels, in ways that could help prevent future security bugs from arising and persisting.
Luke is one of the developers working on Firefox's JavaScript engine, which is currently our largest source of critical security bugs.
Method
I imagined we would recurse in exhaustive breadth and exhausting depth. Instead, we recursed only on the most interesting items, and refined a checklist of starting points:
- What was the bug?
- What went wrong in the developer's thinking that caused the bug to be introduced?
- What made the bug exploitable?
- What caused us to use especially dangerous features of C++?
- Could a new abstraction make it possible to do this both fast and safe?
- What caused the bug to persist? Could we have caught this earlier with improved regression tests, fuzz testing, dynamic analysis, or static analysis?
Luke and I made trees for all ten bugs, at first on paper and later using EtherPad. Then I extracted and categorized what I thought were the most useful lessons and recommendations.
Recommendations for introducing fewer bugs
Casts
- Create centralized, type-restricted cast functions. This protects you when you change the representation of one of the types. It also protects against mistakes that cause the input type to be incorrect.
Sentinel values
- Use tagged unions instead.
- Use a typed wrapper (a struct containing a single value). When assigning from the underlying numeric type, convert using one of two functions: one that checks for special values, and one that explicitly does not.
- Audit existing code paths to ensure they cannot generate the special value.
Clarity of invariants
- Increase use of methods named AssertInvariants
- Create an alias for JS_ASSERTION called JS_INVARIANT.
Interacting with other developers
- If you're about to do something gross because someone else doesn't expose the right API/helper, maybe you should get it exposed.
JS Engine specific
- Any patch that touches rooting should be reviewed by Igor.
- Interpreter could have better abstraction and encapsulation for its stack.
Recommendations for catching bugs earlier
Static analysis
- Find all casts (C-style casts, the reinterpret_cast keyword, and casts through unions) for a given type. Could be used to enforce centralization or to find things that should be centralized.
- Be suspicious of a function with multiple return statements, all of which return the same primitive value.
- Be suspicious of a function returning true/success in an OOM path.
Dynamic analysis
- Ask Valgrind developers what they think of providing (in valgrind.h) a way to tie the addressability of "stacklike memory" to a variable that represents the end of the stack.
Fuzzing
- We should fuzz worker threads somehow.
- In browser (slow and messy, but it's what users are running).
- In thread-safe shell (--enable-threadsafe?), which has "toy workers".
- We should fuzz compartments better.
- I should ask Blake and Andreas for help with testing compartments and wrappers.
- I should ask Gary to run jsfunfuzz in xpcshell, where I can test both same-origin and different-origin compartments, and thus get more interesting wrappers.
- We should give JS OOM fuzzing another shot.
Next steps
I'm curious if others have additional ideas for what could have prevented the ten bugs we looked at. For example, someone like Jeff Walden, who loves to write exhaustive regression tests, might have ideas that Luke and I did not consider.
I'd also like to do this kind of analysis with a other developers on bugs they have fixed.
September 2nd, 2011 at 5:17 am
About OOM fuzzing: The method that I have tried and found to be reasonably fast is to fork(), return NULL in the child and see if it terminates properly. The parent would continue without failures. (I wrote that up in http://blogs.gnome.org/otte/2007/11/03/robustness-testing/ but the links from there are all gone.) IIRC that method was something like 30x slower than normal execution of the program, but it at least was essentially O(runtime of original program).
Also, it gets tricky when your code writes to files/sockets, but for a JS engine you should be able to avoid that.