Archive for the 'Security' Category

Lessons from JS engine bugs

Thursday, September 1st, 2011

Last week, I asked Luke Wagner to explain some security bugs that he fixed in the past. I hoped to learn from each bug at multiple levels, in ways that could help prevent future security bugs from arising and persisting.

Luke is one of the developers working on Firefox's JavaScript engine, which is currently our largest source of critical security bugs.

Method

I imagined we would recurse in exhaustive breadth and exhausting depth. Instead, we recursed only on the most interesting items, and refined a checklist of starting points:

  • What was the bug?
  • What went wrong in the developer's thinking that caused the bug to be introduced?
  • What made the bug exploitable?
  • What caused us to use especially dangerous features of C++?
  • Could a new abstraction make it possible to do this both fast and safe?
  • What caused the bug to persist? Could we have caught this earlier with improved regression tests, fuzz testing, dynamic analysis, or static analysis?

Luke and I made trees for all ten bugs, at first on paper and later using EtherPad. Then I extracted and categorized what I thought were the most useful lessons and recommendations.

Recommendations for introducing fewer bugs

Casts

  • Create centralized, type-restricted cast functions. This protects you when you change the representation of one of the types. It also protects against mistakes that cause the input type to be incorrect.

Sentinel values

  • Use tagged unions instead.
  • Use a typed wrapper (a struct containing a single value). When assigning from the underlying numeric type, convert using one of two functions: one that checks for special values, and one that explicitly does not.
  • Audit existing code paths to ensure they cannot generate the special value.

Clarity of invariants

Interacting with other developers

  • If you're about to do something gross because someone else doesn't expose the right API/helper, maybe you should get it exposed.

JS Engine specific

  • Any patch that touches rooting should be reviewed by Igor.
  • Interpreter could have better abstraction and encapsulation for its stack.

Recommendations for catching bugs earlier

Static analysis

  • Find all casts (C-style casts, the reinterpret_cast keyword, and casts through unions) for a given type. Could be used to enforce centralization or to find things that should be centralized.
  • Be suspicious of a function with multiple return statements, all of which return the same primitive value.
  • Be suspicious of a function returning true/success in an OOM path.

Dynamic analysis

  • Ask Valgrind developers what they think of providing (in valgrind.h) a way to tie the addressability of "stacklike memory" to a variable that represents the end of the stack.

Fuzzing

  • We should fuzz worker threads somehow.
    • In browser (slow and messy, but it's what users are running).
    • In thread-safe shell (--enable-threadsafe?), which has "toy workers".
  • We should fuzz compartments better.
    • I should ask Blake and Andreas for help with testing compartments and wrappers.
    • I should ask Gary to run jsfunfuzz in xpcshell, where I can test both same-origin and different-origin compartments, and thus get more interesting wrappers.
  • We should give JS OOM fuzzing another shot.

Next steps

I'm curious if others have additional ideas for what could have prevented the ten bugs we looked at. For example, someone like Jeff Walden, who loves to write exhaustive regression tests, might have ideas that Luke and I did not consider.

I'd also like to do this kind of analysis with a other developers on bugs they have fixed.

Improving intranet compatibility

Thursday, August 25th, 2011

Some organizations are reluctant to keep their browsers up-to-date because they worry that internal websites might not be compatible.

Organization-internal sites can have unusual compatibility constraints. Many have small numbers of users, yet are highly sensitive to downtime. Some were developed with the assumption that the web would always be as static as it was in 2003.

Rapid releases help in some ways: fewer things change at a time; we can deprecate APIs before removing them; and the permanent Aurora and Beta audiences help test each new release consistently.

But frequent releases make manual testing impractical. (Let's pretend for now that the roughly-monthly security "dot releases" never broke anything.)

As with the problem of extension compatibility, overlapping releases could be part of a solution. But we should start by thinking about ways to attack compatibility problems directly.

Automated testing

A tool could scan for the use of deprecated web features, such as the “-moz-” prefixed version of border-radius. This tool, similar in spirit to the AMO validator, could be run on the website's source code or on the streams received by Firefox.

There is already a compatibility detector for HTML issues, but my intuition is that CSS and DOM compatibility problems are more common.

User testing

Not all visitors have the technical skills and motivation to report issues they encounter. In some organizations, bureaucracy can stifle communication between visitors and developers. Automating error reports could help.

It would be cool if a Firefox extension could report warnings and errors from internal sites to a central server.

Depending on the privacy findings from this extension, it could become an onerror++ API available to all web sites, similar to the Web Performance APIs. This seems more sensible than adding API-specific error reporting.

Sample contracts

We could suggest things to put in contracts for outsourced intranet development.

Often, the best solution is to align incentives. That could take the form of specifying that the developers are responsible for maintenance costs for a specified length of time.

When that isn't practical, I'd suggest specifying requirements such as:

Channel management

There is a roughly exponential distribution of home users between the Nightly, Aurora, Beta, and Release channels. This helps Mozilla and public web sites fix incompatibilities before they affect large numbers of users.

Large organizations should strive for a similar channel distribution so that internal websites benefit in the same way. It might make sense for Mozilla to provide tools to help, perhaps based on our own update service or Build Your Own Browser.

Counterintuitively, the best strategy for security-conscious organizations may be to standardize on the Beta channel, with the option to downgrade to Release in an emergency. This isn't as crazy as it sounds. Today's betas are as stable as yesteryear's release candidates, thanks to the Aurora audience and the discipline made possible by the 6-week cadence. And since Beta gets security fixes sooner, they are safer in some ways.

The loss of release overlap takes away some options from IT admins and intranet developers, but rapid releases also make possible new strategies that could be better in the long term.

Secure and compatible

Thursday, August 25th, 2011

Previously, I discussed some of the ways Firefox's new rapid release process improves its security. But improving Firefox's security only helps users who actually update, and some people have expressed concern that rapid releases could make it more difficult for users to keep up.

There is some consensus on how to make the update process smoother and how to reduce the number of regressions that reach users. But incompatibilities pose a tougher problem.

We don't want users to have to choose between insecure and incompatible.

There are three ways to avoid facing Firefox users with this dilemma:

  • Prevent incompatibilities from arising in the first place.
  • Hasten the discovery of incompatibilities, and fix them quickly.
  • Overlap releases by delaying the end-of-life for the old version.

Overlapping releases

For many years, Mozilla tried to prevent this dilemma by providing overlap between releases. For example, Firefox 2 was supported for six months after Firefox 3.

We observed two security-related patterns that made us reluctant to continue providing overlapping releases:

First, some fixes were never backported, so users on the "old but supported" version were not as safe as they believed. Sometimes backports didn't happen because security patches required additional backports of architectural changes. Sometimes we were scared to backport large patches because we did not have large testing audiences for old branches. Some backports didn't happen because they affected web or add-on compatibility. And sometimes backports didn't happen simply because developers aren't focused on 2-year-old code. Providing old versions gave users a false sense of security.

Second, a feedback cycle developed between users lagging behind and add-ons lagging behind. Many stayed on the old version until the end-of-life date, and then encountered surprises when they were finally forced to update. Providing old versions did not actually shield users from urgent update dilemmas.

I feel we can only seriously consider returning to overlapping releases if we can first overcome these two problems.

Improving add-on compatibility

Everything we do to improve add-on compatibility helps to break the lag cycle.

The most ambitious compatibility project is Jetpack, which aims to create a more stable API for simple add-ons. Jetpack also has a permission model that promises to reduce the load on add-on reviewers, especially around release time.

Add-ons hosted on AMO now have their maxVersion bumped automatically based on source code scans. Authors of non-AMO-hosted add-ons can use the validator by itself or run the validator locally.

This month, Fligtar announced a plan of assuming compatibility. Under this proposal, only add-ons with obvious, fatal incompatibilities (such as binary XPCOM components compiled against the wrong version) will be required to bump maxVersion.

With assumed compatibility, we will be able to make more interesting use of crowdsourced compatibility data from early adopters. Meanwhile, the successful Firefox Feedback program may expand to cover add-ons.

Breaking the add-on lag cycle

Even with improvements to add-on compatibility, some add-ons will be updated late or abandoned. In these cases, we should seek to minimize the impact on users.

In Aurora 8, opt-in for third-party add-ons allows users to shed unwanted add-ons that had been holding them back.

When users have known-incompatible add-ons, Firefox should probabilistically delay updates for a few days. Many add-ons that are incompatible on the day of the release quickly become compatible. Users of those add-ons don't need to suffer through confusing UI.

But after a week, when it is likely that the add-on has been abandoned, Firefox should disable the add-on in order to update itself. It would be dishonest to ask users to choose between security and functionality once we know the latter is temporary.

Once we've solved the largest compatibility problems, we can have a more reasonable discussion about the benefits and opportunity costs of maintaining overlapping releases.

Rapid releases and security

Thursday, August 25th, 2011

Several people have asked me whether Mozilla's move to rapid releases has helped or hurt Firefox's security.

I think the new cadence has helped security overall, but it is interesting to look at both the ways it has helped and the ways it has hurt.

Security of new features

With release trains every 6 weeks, developers feel less pressure to rush half-baked features in just before each freeze.

Combined with Curtis's leadership, the rapid release cycle has made it possible for security reviews to happen earlier and more consistently. When we need in-depth reviews or meetings, we aren't overwhelmed by needing twenty in one month.

The rapid release process also necessitated new types of coordination, such as the use of team roadmaps and feature pages. The security team is able to take advantage of the new planning process to track which features might need review, even if the developers don't come to the security team and ask.

Security reviews are also more effective now. When we feel a new feature should be held back until a security concern is fixed, we create less controversy when the delay is only 6 weeks.

Security improvements and backports

Many security improvements require architectural changes that are difficult to backport to old versions. For example:

  • Firefox 3.6 added frame poisoning, making it impossible to exploit many crashes in layout code. Before frame poisoning, the layout module was one of the largest sources of security holes.
  • Firefox 4 fixed the long-standing :visited privacy hole. Mozilla was the first to create a patch, but due to Firefox's long development cycle, several other browsers shipped fixes first.
  • Firefox 6 introduced WeakMap, making it possible for extensions to store document-specific data without leaking memory and without allowing sites to see or corrupt the information. Extension authors might not be comfortable using WeakMap until most users are on Firefox 6 or higher.

Contrary to the hopes of some Linux distros and IT departments, it is no longer possible to backport "security fixes only" and have a browser that is safe, stable, and compatible.

We're constantly adding security features and mitigations for memory safety bugs. We're constantly reducing Firefox's attack surface by rewriting components from C++ into JavaScript (and soon Rust).

Disclosure windows

One area where rapid releases may hurt is the secrecy of security patches.

We keep security bug reports private until we've shipped a fix, but we check security patches into a public repository. Checking in patches allows us to test the fixes well, but opens the possibility of an attacker reverse-engineering the bug.

We check security patches into a public repository for two reasons:

  • We cannot rely entirely on in-house testing. Because of the variety of websites and extensions, our experience is that some regressions introduced by security patches are only found by volunteer testers.
  • We cannot ship public binaries, even just for testing, based on private security patches. This would violate both the spirit and letter of some open-source licenses. It also wouldn't be effective for secrecy: attackers have been using binary patch analysis to attack unpatched Microsoft software. (And that's without access to the to the old source!)

But we can shorten and combine the windows between patch disclosure and release. Instead of landing security patches on mozilla-central and letting them reach users in the normal 12-18 weeks, we can:

  • Land security fixes to all active branches at once. This may be safer now that the oldest active branch is at most 18 weeks old compared to mozilla-central. (In contrast, the Firefox 3.6.x series is based on a branch cut 2009-08-13, making it 24 months old.) But even 18 weeks is enough to create some regression risk, and it is not clear which audiences would test fixes on each branch.
  • Accelerate the channel flow for security bugs. For example, hold security fixes in a private repository until 3 weeks before each release. Then, give them a week on nightly, a week on aurora, and a week on beta. The same channel audiences that make rapid release possible would test security fixes, just for a shorter period of time. Something like this already happens on an ad-hoc basis in many bugs; formalizing the process could make it smoother.

There are significant security wins with rapid releases, and a few new challenges. Overall, it's a much better place to be than where we were a year ago.

(I posted followups about add-on compatibility and intranet compatibility.)

Untrusted text in security dialogs

Wednesday, July 14th, 2010

I just gave a 10-minute lightning talk at SOUPS on the topic of untrusted text in security dialogs.

I've been reading Firefox security bug reports over the years, and I've collected a list of things that can go wrong in security dialogs. New security dialogs should be tested against these attacks, or preferably designed to not be dialogs.

Fuzzing talk at the Mozilla Summit

Wednesday, July 14th, 2010

At the 2010 Mozilla Summit, I talked about my JavaScript engine and DOM fuzzers, which have each found many hundreds of bugs. I also talked about the automations that keep me sane when I fuzz these complex components.

My slides are in the S5 web-based presentation format. You can click the Ø button to view the presentation in "handout mode" and see what I planned to say while each slide was up.

I shared a presentation slot with Mozilla contractor Paul Nickerson, who has a separate slide deck. He wisely saved the best part of his talk for the end: a demo of his font fuzzer causing Windows 7 to blue-screen.

Simon Willison on phishing defense

Tuesday, March 2nd, 2010

If you want to stay safe from phishing and other forms of online fraud you need at least a basic understanding of a bewildering array of technologies—URLs, paths, domains, subdomains, ports, DNS, SSL as well as fundamental concepts like browsers, web sites and web servers. Misunderstand any of those concepts and you’ll be an easy target for even the most basic phishing attempts. It almost makes me uncomfortable encouraging regular people to use the web because I know they’ll be at massive risk to online fraud.

- Simon Willison

Catching regressions faster

Wednesday, February 10th, 2010

My latest post on the Mozilla Security Blog:

Fixing security holes without introducing new bugs