Detecting leak-until-shutdown bugs
Sunday, November 14th, 2010Most of Mozilla's leak-detection tools work on the premise that when the application exits, no objects should remain. This strategy finds many types of leak bugs: I've used tools such as trace-refcnt to find over a hundred. But it misses bugs where an object lives longer than it should.
The worst of these are bugs where an object lives until shutdown, but is destroyed during shutdown. These leaks affect users as much as any other leak, but most of our tools don't detect them.
After reading about an SVG leak-until-shutdown bug that the traditional tools missed, I wondered if I could find more bugs of that type.
A new detector
I started with the premise that if I close all my browser windows (but open a new one so Firefox doesn't exit), the number of objects held alive should not depend on what I did in the other windows. I retrofitted my DOM fuzzer with a special exit sequence:
- Open a new, empty window
- Close all other windows
- Until memory use stabilizes
- Flush all in-memory caches
- Collect all garbage (XPCOM CC + JS GC)
- Briefly return to the event loop
- Count the remaining objects (should be constant)
- Continue with the normal shutdown sequence
- Count the remaining objects (should be 0)
If the first count of remaining objects depends on what I did earlier in the session, and the second count is 0, I've probably found a leak-until-shutdown bug.
To reduce noise, I had to disable the XUL cache and restrict the counting to GlobalWindow and nsDocument objects. On Linux, I normally count 4 nsGlobalWindows and 4 nsDocuments.
So far, I've found two bugs where additional objects remain:
I'm glad we found the <video> leak before shipping Firefox 4!
Note that this tool can't find all types of leaks. It won't catch leak-until-page-close bugs or other leaks with relatively short lifetimes. It can't tell you if a cache is misbehaving or if cycle collection isn't being run often enough.
Next steps
Depending on how promising we think this approach is, we could:
- Use it in more types of testing
- Package it into a more user-friendly extension for Firefox debug builds
- Make it a regular part of fuzzing
- Use it for regression tests
- Add something to Gecko that's similar but less kludgy
- Expand the classes it will complain about
- Debug the flakiness with smaller objects
- Make the XUL cache respond to memory-pressure notifications
It's also possible that DEBUG_CC, and in particular its "expected to be garbage" feature, will prove itself able to find a superset of leaks that my tool can find.