See also http://www.squarefree.com/2011/09/01/lessons-from-js-engine-bugs/
I have bolded what seem like the most useful lessons and recommendations.
Bug 626631 - WebWorker + GC crash
- nsAutoJSValHolder didn't actually root the object (when mutated)
- nsAutoJSValHolder was written by someone who didn't really understand rooting
- Any patch that touches rooting should be reviewed by Igor.
- nsAutoJSValHolder exists at all, rather than everyone using js::AutoGCRooter
- If you're about to do something gross because someone else doesn't expose the right API/helper, maybe you should get it exposed.
- We should consider fuzzing worker threads somehow.
- In browser (slow and messy, but it's what users are running).
- In thread-safe shell (--enable-threadsafe + nspr flag??), which has "toy workers".
Bug 622318
- Use and definition of pendingGuardCondition (a member variable) were 100 lines apart, making thinkoes easy
Bug 621202 - Crash in rope-flattening after string-replace
Found by LangFuzz.
- Rope code did not take into account that an initially "linear" string could become "dependent". (This transformation is known as the "extensible optimization".)
Bug 601102 - Crash when exception bubbles through same-origin compartments
Found as a topcrash and diagnosed by adding an assertion.
- Violation of the Compartment Invariant: "objects should only point to other objects in the same compartment".
- When there was only one compartment, this invariant was vacuouly true.
- It was difficult to audit all code for compliance with the invarant; we relied to some extent on assertions and fuzzing.
- The bug was not found earlier.
- It is rare for exceptions to cross compartments. JetPack happened to cause some exception objects to be long-lived.
- We don't have an assertion for compartment mismatches during GC marking.
- The bug was not found earlier through fuzzing.
- Cross-compartment fuzzing may be weak.
- I should ask Blake and Andreas for help with testing compartments and wrappers.
- I should ask Gary to run jsfunfuzz in xpcshell, where I can test both same-origin and different-origin compartments, and thus get more interesting wrappers.
Bug 584650 - gczeal compartment crash
A fatvals regression. Found by jsfunfuzz, but not quickly.
- A rooting array -- that is, an array whose purpose was to hold roots for a precise GC -- would temporarily contain uninitialized entries due to the way arrays expand.
- "GC can run at almost any time" is one of those things that requires enumerative reasoning.
- Central, scary code should be reviewed by more people?
- The bug was so subtle that even additional reviewers would probably not have caught it :(
Bug 579273 - A fatvals crash
- The switch to fatvals required updating all casts to and from jsval. A cast through a union was missed.
- Casts involving jsval were spread out.
- Increase centralization by creating restricted-cast functions. (Also protects against casting from the wrong type.)
- Enforce use of restricted-cast functions with static analysls?
- It was difficult to find all the places jsval was casted.
- C++ provides many types of reinterpret casts: C-style casts, the reinterpret_cast keyword, and casts through unions.
- Unions are especially tricky because sometimes they are used to reinterpret bits and sometimes they are implicitly tagged.
- It would be nice to have a static analysis to find all the reinterpret casts used on a given type. (Could also be used to find type pair that could benefit from cast centralization.)
Bug 564937 - fast iterators shouldn't touch regs.sp[0]
Old bug made evident by a new patch.
- Used space on stack without claiming it (without expanding the stack with a "push")
- Interpreter is old and creaky.
- Interpreter could have better abstraction and encapsulation for its stack.
- Maybe we should grep the interpreter for regs.sp[0], which is almost always wrong.
- Luke did this, and found no other violations.
- Maybe we should teach Valgrind how to understand which stack memory has been claimed.
- Hard to do through VALGRIND_RESIZEINPLACE_BLOCK because the stack end is not always manipulated through macros such as PUSH_*.
- Ask njn and sewardj what they think of providing (in valgrind.h) a way to tie the addressability of "stacklike memory" to a given memory word.
Bug 559256 - Bad OOM-handling in js_GrowSlots, js_AllocSlots
Identified through code inspection.
- JS engine tries to recover from OOM (Out Of Memory) situations.
- OOM recovery is a long-standing debate.
- Jesse argues that OOM recovery is the worst source of enumerative reasoning and source code bloat in the JS engine, and is nearly impossible to test.
- Brendan argues that removing OOM recovery would unconscionably increase JS crashes, especially in a single-process 32-bit browser.
- Jesse counters that some other part of the browser is probably going to crash anyway once the VM space is exhausted.
- Exploitable
- Dynamic index off of null can be addressable.
- Maybe the dynamic-offset point should have a null check?
- Not found through fuzzing
- OOM fuzzing might be worthwhile, at least in the short term while we still have OOM recovery.
- This would be slower than gczeal!
- When Paul Biggar gave up on exhaustive OOM testing, I interpreted this as meaning that fuzzing is also doomed. I could be wrong.
- I should ask Gary or Christian to give JS OOM fuzzing another shot.
- Not caught by static analysis
- A static analysis could be suspicious of a function with multiple return statements, all of which return the same primitive value.
- A static analysis could be suspicious of a function returning true in an OOM path.
Various bugs involving non-canonical NaN values, 2 of which were likely-exploitable
- When we changed the JS value representation to fatvals, giving special meaning to non-canonical NaNs, we missed some numeric functions that introduce non-canonical NaNs.
- When semantically overloading a numeric type, by making certain values special, do one of the following:
- Use tagged unions instead.
- Use a typed wrapper (a struct containing a single value). When assigning from the underlying numeric type, convert using one of two functions: one that checks for special values, and one that explicitly does not.
- Audit existing code paths to ensure they cannot generate the special value.