This page describes some things that web developers need to know about web browsers in order to create sites that don't have security holes.
This is still a draft, as you might be able to guess from the number of "TODO:" items and the poor separation of opinion from fact.
A cross-site scripting (XSS) hole is when an attacker can inject scripts into a page sent by your server. Browsers treat these injected scripts like any other script in the page.
For example, if http://www.yoursite.com/search?q=<script>alert(5)</script> returns "<p>There were no hits for <script>alert(5)</script>.</p>", your site is vulnerable to a simple XSS hole. If a malicious site links to such a URL, it can take control over the user's account on the site in several ways:
Internet Explorer supports a non-standard extension to HTTP cookies called HttpOnly. HttpOnly guards against the first version of XSS attacks, but only the first version (except in some clever cases involving multiple subdomains). Because of this, and because Firefox does not support HttpOnly (bug 178993), you should not rely on HttpOnly to protect your site and visitors.
Even on sites where users do not have accounts, XSS holes can be a problem. For example, intranet sites are automatically limited to users within the intranet, but XSS holes can allow outsiders to access them through the web browsers of employees.
For sites where the input is meant to be plain text, the fix is simple: escape everything as it is output as HTML, using an escaping mechanism appropriate to where it is.
& < > " 'as
& < > " '
\ / ' "as
Your choice of server-side libraries may affect how easy it is to get this right. If your HTML generation is based on concatenating strings and calling functions to escape variables that might contain certain characters, security depends on "remembering to escape" every variable. One alternative is to use an API where you're constructing a DOM tree out of elements and text nodes. Another is to use an API where you're opening tags, giving them attributes, closing tags, and outputting text. If implemented well, these abstractions might make server-side programming easier and improve a site's speed in addition to making the code less error-prone.
To be safe from XSS, sites should also specify the character set of every page. Browsers use all kinds of crazy heuristics to determine the character set when it is not specified, and the inputs to the heuristics are more or less control of an attacker. For example, browsers may take the character set of the linking page, byte distributions within the page, and the user's language into account. Some character sets (such as UTF-7) have ways of representing the < character without using the byte corresponding to that character in ASCII, which can lead to XSS if a browser thinks your site is UTF-7.
For sites where users are allowed to use HTML, the goal is not to escape the input, but to restrict what HTML features can be used.
The level of restriction depends on the site. A site like MySpace may decide to let users customize the appearance of their pages as much as they want. In contrast, a forum will probably limit users to P, BLOCKQUOTE, lists, simple inline styles, and perhaps images.
The naive approach of "stripping tags" using regular expressions often misses things because it interprets them differently than browsers do. For example, the regular expression <.*?> matches nothing in "<script src='http://evil.com/evil.js' </script", leaving your Internet Explorer and Firefox visitors vulnerable (see bug 226495 and for details about "half-tag" parsing). Gerv has an example involving unterminated entities. RSnake maintains an extensive list of XSS vectors that naive filtering may miss.
TODO: go through RSnake's list and make sure my advice covers everything.
The best approach is to parse the input HTML on the server, keeping only tags, attributes, and attribute values you want to allow. Upon serialization, the result will be "well-formed" HTML that browsers will parse the same way your server did.
Things you should ensure are never allowed in user-submitted HTML, to protect the accounts of visitors who use Firefox and IE:
If you must allow unsanitized, untrusted HTML to be part of your site, ensure that those pages are not on the same hostname as where other users log in. (webmail, web hosts, attachments in a bug-tracking system, Google cache) (see Gerv's proposal)
TODO: test various browsers' interpretation of setting "document.domain" to figure out exactly what "not on the same hostname" needs to mean
If you serve untrusted content as text/plain, be aware that some browsers will ignore the mime type (violating the HTTP spec) and treat it as something other than text/plain. Internet Explorer notoriously treats text/plain content as HTML if it contains anything that looks like a tag, and even Firefox does some sniffing if some of the bytes would be "control characters" in ASCII. The best solution is to treat untrusted text/plain in the same way as untrusted text/html that cannot be modified: serve it from a different hostname.
TODO: find out whether this warning should also apply to application/octet-stream or other mime types.
TODO: find out what the possible results of Firefox's sniffing are. Can it result in treating the content as text/html, or only as something to be downloaded?
Web sites should not allow users to put up forms with <input type="password">. Password managers may fill in the password, thinking the form is a legitimate part of the site.
You can probably check the "type" attribute for "input" elements against your whitelist at the same time you filter for XSS.
The exact likelihood that such an attack will succeed depends on the browser used. Firefox and Safari fill in passwords automatically. IE fills in passwords if the user types his username (imagine a form that says "Type your username to find out your Star Wars name!") or double-clicks the username field and then clicks right below it. Opera fills in passwords if the user clicks the wand icon on the toolbar. Even with Opera, users probably aren't as careful with activating the password manager as they would be when actually typing their password, since password manager does defend against ordinary phishing.
This attack was first used in November 2006 against MySpace, which has since fixed their site. Firefox and Safari developers are trying to determine whether they can mitigate the problem for sites that remain vulnerable, hopefully without crippling password management features and thus making users more vulnerable to actual phishing. (See bug 360493.)
A Cross-site request forgery hole is when a malicious site can cause a visitor's browser to make a request to your server that causes a change on the server. The server thinks that because the request comes with the user's cookies, the user wanted to submit that form.
Depending on which forms on your site are vulnerable, an attacker might be able to do the following to your victims:
Attacks can also be based on the victim's IP address rather than cookies:
Make sure form submissions that cause server-side changes use your own forms. There are two ways you can do this:
Since so few sites protect their visitors against CSRF attacks, browser makers have discussed possible client-side fixes for CSRF. We didn't come up with anything good. For reference, see bug 38933, bug 40132, bug 246476, and bug 246519. At most, browsers might be able to prevent CSRFs from web sites to intranet sites, but not between web sites.
Don't load an untrusted site in a frame. The site will be able to load data into other frames, leading to a spoofing attack. If you visitors use Internet Explorer, don't use frames at all, because other sites can replace your site's frames. See bug 246448 for more information.
Don't put passwords (or any data that should remain private) in a URL if the page at that URL contains external links. When users click those links, the entire URL is sent as a referrer. https sites are not safe, because links to other https sites include the full referrer.
If your page is designed to be displayed as an iframe on other sites, you may want specify a background color. If your page is displayed in an <iframe> and does not specify a background color, the background will be transparent, not white (bug 154957).
Even if you protect against traditional CSRF attacks, malicious sites can make users click parts of your site without realizing it. Examples of attacks include sub-reaction-time page changes, pages that load your site in an iframe and cover all but an 8px-by-8px portion of the iframe, and pages that load your site in a 99% transparent iframe. Removing "One-click purchase" buttons helps, but only a little. One possible solution to the attacks involving iframes is to add a script that detects when the page is loaded in a frame and prevents form posting. (This solution should not be confused with the classical "break out of frames" script; it needs to be faster than that.)
If you echo site-relative URLs, be careful not to echo "relative" URLs starting with //. These are interpreted as being relative to the protocol rather than relative to the site. Bugzilla once had a bug where the login form could be made to submit off-site: bug 325079.
Financial sites such as banks frequently decide to disable password manager using the autocomplete="off" attribute. I'm not sure why they do this; it seems to me that it hurts security more than it helps:
Advantages of letting users save passwords:
Disadvantages of letting users save passwords:
After users log out, encourage them to close the browser window.
To prevent the back button from reaching private pages after the use has logged out of your site (but not closed the window), use cache-control. This will piss off your users, because it prevents the browser from going Back quickly and makes some browsers forget form information upon going Back. Note that the titles of pages in session history will still be visible in the Back button drop-down.
For sites that are frequently used at public computers, consider tying the autocomplete attribute to the checkbox in your login form that controls cookie lifetime. (This checkbox is often labeled "Remember me on this computer" or "Don't ask me for my password for 2 weeks"). For example, a script might set the autocomplete attribute to "off" if the password field has received a keypress and the checkbox is unchecked. That way, users at (misconfigured) public computers are unlikely to accidentally save their passwords on the public computer, while home users will be able to use password manager.
Be aware that the DNS protocol allows bogus.attacker.com to resolve to an attacker's IP one minute and your server's IP the next. The "Princeton attack" takes advantage of this and has the same security impact as XSS. Intranet servers should check the Host header and return "400 Bad Request" if an unrecognized hostname is given, as the HTTP spec suggests but does not require. In addition, intranet firewalls should prevent DNS requests on external hostnames from resolving to internal IP addresses. Similarly, browsers, operating systems, and ISPs should all prevent external hostnames from resolving to 127.0.0.1. (Firefox briefly tried to work around Princeton attacks by caching DNS responses for longer than allowed by the DNS protocol; see bug 149943. It no longer does this; see bug 162871, bug 174590, bug 205726, and 223861.)
Maintainers of intranet sites should not neglect other security considerations, such as XSS, CSRF, and SQL injection. Firewalls do not prevent malicious sites loaded in users' browsers from trying XSS attacks.
Some security holes in web sites don't involve web browsers at all and are therefore out of scope for this page. Examples include directory traversal, buffer overflows, SQL injection, and forgetting to apply form access controls to both the page with the form and the code that handles the form. The Web Application Security Consortium's Threat Classification enumerates common server-side holes.