XSS (Cross Site Scripting) Prevention Cheat Sheet

= Introduction =

This article provides a simple positive model for preventing XSS using output escaping/encoding properly. While there are a huge number of XSS attack vectors, following a few simple rules can completely defend against this serious attack.

These rules apply to all the different varieties of XSS. Both reflected and stored XSS can be addressed by performing the appropriate escaping on the server-side. The use of an escaping/encoding library like the one in ESAPI is strongly recommended as there are many special cases. DOM Based XSS can be addressed by applying these rules on the client on untrusted data.

For a great cheatsheet on the attack vectors related to XSS, please refer to the excellent XSS Cheat Sheet by RSnake. More background on browser security and the various browsers can be found in the Browser Security Handbook.

Untrusted Data
Untrusted data is most often data that comes from the HTTP request, in the form of URL parameters, form fields, headers, or cookies. But data that comes from databases, web services, and other sources is frequently untrusted from a security perspective. That is, it might not have been perfectly validated. The OWASP Code Review Guide has a decent list of methods that return tainted data in various languages, but you should be careful about your own methods as well.

Untrusted data should always be treated as though it contains an attack. That means you should not send it anywhere without taking steps to make sure that any attacks are detected and neutralized. As applications get more and more interconnected, the likelihood of a buried attack being executed by a downstream interpreter increases rapidly.

Traditionally, input validation has been the preferred approach for handling untrusted data. However, input validation is not a great solution for injection attacks. First, input validation is typically done when the data is received, before the destination is known. That means that we don't know which characters might be significant in the target interpreter. Second, and possibly even more importantly, applications must allow potentially harmful characters in. For example, should poor Mr. O'Malley be prevented from registering in the database simply because SQL considers ' a special character?

While input validation is important and should always be performed, it is not a complete solution for injection attacks. It's better to think of input validation as defense in depth and use escaping as described below as the primary defense.

Escaping
"Escaping" is a technique used to ensure that characters are treated as data, not as characters that are relevant to the parser. There are lots of different types of escaping, sometimes confusingly called output "encoding." Some of these techniques define a special "escape" character, and other techniques have a more sophisticated syntax that involves several characters.

Do not confuse output escaping with the notion of Unicode character encoding, which involves mapping a Unicode character to a sequence of bits. This level of encoding is automatically decoded, and does not defuse attacks. However, if there are misunderstandings about the intended charset between the server and browser, it may cause unintended characters to be communicated, possibly enabling XSS attacks. This is why it is still important to specify the Unicode character encoding (charset), such as UTF-8, for all communications.

Escaping is the primary means to make sure that untrusted data can't be used to convey an injection attack. There is no harm in escaping data - it will still render in the browser properly. Escaping merely prevents attacks from working.

Injection Theory
Injection is an attack that involves breaking out of a data context and switching into a code context through the use of special characters that are significant in the interpreter being used. XSS is a form of injection where the interpreter is the browser and attacks are buried in an HTML document. HTML is probably the worst mashup of code and data of all time, as there are so many possible places to put code and so many different valid encodings. HTML is particularly difficult because it is not only hierarchical, but also contains many different parsers (XML, HTML, JavaScript, VBScript, CSS, URL, etc...).

To really understand what's going on with XSS, you have to consider injection into the hierarchical structure of the HTML DOM. Given a place to insert data into an HTML document (that is, a place where a developer has allowed untrusted data to be included in the DOM), there are two ways to inject code:


 * Injecting UP:The most common way is to close the current context and start a new code context. For example, this is what you do when you close an HTML attribute with a "> and start a new &lt;script> tag. This attack closes the original context (going up in the hierarchy) and then starts a new tag that will allow script code to execute. Remember that you may be able to skip many layers up in the hierarchy when trying to break out of your current context. For example, a &lt;/script> tag may be able to terminate a script block even if it is injected inside a quoted string inside a method call inside the script. This happens because the HTML parser runs before the JavaScript parser.


 * Injecting DOWN:The less common way to perform XSS injection is to introduce a code subcontext without closing the current context. For example, if you change &lt;img src="...UNTRUSTED DATA HERE..." /> to &lt;img src="javascript:alert(1)" /> you do not have to escape the HTML attribute context. Instead, you introduce context that allows scripting within the src attribute. Another example is the expression functionality in CSS properties. Even though you may not be able to escape a quoted CSS property to inject up, you may be able to introduce something like xss:expression(document.write(document.cookie)) without ever leaving the current context.

The rules in this document have been designed to prevent both UP and DOWN varieties of XSS injection. To prevent injecting up, you must escape the characters that would allow you to close the current context and start a new one. To prevent attacks that jump up several levels in the DOM hierarchy, you must also escape all the characters that are significant in all enclosing contexts. To prevent injecting down, you must escape any characters that can be used to introduce a new sub-context within the current context.

A Positive XSS Prevention Model
This article treats an HTML page like a template, with slots where a developer is allowed to put untrusted data. These slots cover the vast majority (99%?) of the common places where a developer might want to put untrusted data. Putting untrusted data in other places in the HTML is not allowed. This is a "whitelist" model, that denies everything that is not specifically allowed.

Because of the way browsers parse HTML, each of the different types of slots has slightly different security rules. When you put untrusted data into these slots, you need to take certain steps to make sure that the data does not "escape" that slot and break into a context that allows code execution. In a way, this approach treats an HTML document like a parameterized database query - the data is kept in specific places and is isolated from code contexts with esacaping.

This document sets out the most common types of slots and the rules for putting untrusted data into them safely. Based on the various specifications, known XSS vectors, and a great deal of manual testing with all the popular browsers, we have determined that the rule proposed here are safe.

The slots are defined and a few examples of each are provided. Developers SHOULD NOT put data into any other slots without a very careful analysis to ensure that what they are doing is safe. Browser parsing is extremely tricky and many innocuous looking characters can be significant in the right context.

Why Can't I Just HTML Entity Encode Untrusted Data?
HTML entity encoding is okay for untrusted data that you put in the body of the HTML document, such as inside a &lt;div> tag. It even sort of works for untrusted data that goes into attributes, particularly if you're religious about using quotes around your attributes. But HTML entity encoding doesn't work if you're putting untrusted data inside a &lt;script> tag anywhere, or an event handler attribute like onmouseover, or inside CSS, or in a URL. So even if you use an HTML entity encoding method everywhere, you are still most likely vulnerable to XSS. You MUST use the escape syntax for the part of the HTML document you're putting untrusted data into. That's what the rules below are all about.

= XSS Prevention Rules =

The following rules are intended to prevent all XSS in your application. While these rules do not allow absolute freedom in putting untrusted data into an HTML document, they should cover the vast majority of common use cases. Please add a note to the discussion page if you think we should add additional slots.

Do NOT simply escape the list of example characters provided in the various rules. It is NOT sufficient to escape only that list. Blacklist approaches are quite fragile. The whitelist rules here have been carefully designed to provide protection even against future vulnerabilities introduced by browser changes.

RULE #0 - Never Insert Untrusted Data Except in Allowed Locations
The first rule is to deny all - don't put untrusted data into your HTML document unless it is within one of the slots defined below. The reason for this rule is that there are so many strange contexts within HTML that the list of escaping rules gets very complicated. There’s no good reason to put untrusted data in these contexts.

&lt;script>...NEVER PUT UNTRUSTED DATA HERE...  directly in a script &lt;!--...NEVER PUT UNTRUSTED DATA HERE...-->            inside an HTML comment &lt;div ...NEVER PUT UNTRUSTED DATA HERE...=test />      in an attribute name &lt;...NEVER PUT UNTRUSTED DATA HERE... href="/test" />  in a tag name

Most importantly, never accept actual JavaScript code from an untrusted source and then run it. For example, a parameter named "callback" that contains a JavaScript code snippet. No amount of escaping can fix that.

RULE #1 - HTML Escape Before Inserting Untrusted Data into HTML Element Content
Rule #1 is for when you want to put untrusted data directly into the HTML body somewhere. This includes inside normal tags like div, p, b, td, etc... Most web frameworks have a method for HTML encoding that escapes the characters detailed below. However, this is absolutely not sufficient for other HTML contexts. You need to implement the other rules detailed here as well.

&lt;body>...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE... &lt;div>...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE... any other normal HTML elements

Escape the following characters with HTML entity encoding to prevent switching into any execution context, such as script, style, or event handlers. Using hex entities is recommended in the spec. In addition to the 5 characters significant in XML, the forward slash is included as it helps to end an HTML entity.

&amp;amp; &amp;lt; &amp;gt; &amp;quot; &amp;#x27;    &apos; is not recommended &amp;#x2F;    forward slash is included as it helps end an HTML entity

See the ESAPI reference implementation of HTML entity escaping and unescaping.

String safe = ESAPI.encoder.encodeForHTML( request.getParameter( "input" ) );

RULE #2 - Attribute Escape Before Inserting Untrusted Data into HTML Common Attributes
Rule #2 is for putting untrusted data into typical attribute values like width, name, value, etc... This should not be used for complex attributes like href, src, style, or any of the event handlers like onmouseover. It is extremely important that event handler attributes should follow Rule #3 for HTML JavaScript Data Values.

&lt;div attr=...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE...>content    inside UNquoted attribute &lt;div attr='...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE...'>content  inside single quoted attribute &lt;div attr="...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE...">content  inside double quoted attribute

Escape all characters less than 256 except alphanumeric characters with the &amp;#xHH; format (or a named entity if available) to prevent switching out of the attribute. The reason this rule is so broad is that developers frequently leave attributes unquoted. Properly quoted attributes can only be escaped with the corresponding quote. Unquoted attributes can be broken out of with many characters including space % * +, - / ; < = > ^ | could break out.

See the ESAPI reference implementation of HTML entity escaping and unescaping.

String safe = ESAPI.encoder.encodeForHTMLAttribute( request.getParameter( "input" ) );

RULE #3 - JavaScript Escape Before Inserting Untrusted Data into HTML JavaScript Data Values
Rule #3 concerns the JavaScript event handlers that are specified on various HTML elements. The only safe place to put untrusted data into these event handlers is into a "data value." Including untrusted data inside these little code blocks is quite dangerous, as it is very easy to switch into an execution context, so use with caution.

&lt;script>alert('...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE...')&lt;/script>    inside a quoted string &lt;script>x=...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE...&lt;/script>           one side of an expression &lt;div onmouseover=...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE...&lt;/div>       inside UNquoted event handler &lt;div onmouseover='...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE...'&lt;/div>     inside quoted event handler &lt;div onmouseover="...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE..."&lt;/div>     inside quoted event handler

Escape all characters less than 256 except alphanumeric characters with the \xHH format to prevent switching out of the data value into the script context or into another attribute. Do not use any escaping shortcuts like \" because the quote character may be matched by the HTML attribute parser which runs first. If an event handler is quoted, breaking out requires the corresponding quote. The reason this rule is so broad is that developers frequently leave event handler attributes unquoted.  Properly quoted attributes can only be escaped with the corresponding quote. Unquoted attributes can be broken out of with many characters including space % * +, - / ; < = > ^ | could break out. Also, a closing tag will close a script block even though it is inside a quoted string because the HTML parser runs before the JavaScript parser.

See the ESAPI reference implementation of JavaScript escaping and unescaping.

String safe = ESAPI.encoder.encodeForJavaScript( request.getParameter( "input" ) );

RULE #4 - CSS Escape Before Inserting Untrusted Data into HTML Style Property Values
Rule #4 is for when you want to put untrusted data into a stylesheet or a style tag. CSS is surprisingly powerful, and can be used for numerous attacks. Therefore, it's important that you only use untrusted data in a property value and not into other places in style data. You should stay away from putting untrusted data into complex properties like url, behavior, and custom (-moz-binding). You should also not put untrusted data into IE’s expression property value which allows JavaScript.

&lt;style>selector { property : ...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE...; } &lt;/style>    property value &lt;span style=property : ...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE...;>text&lt;/style>        property value

Use \HH for all characters less than 256 except alphanumeric. Do not use any escaping shortcuts like \" because the quote character may be matched by the HTML attribute parser which runs first. Prevent switching out of the property value and into another property or attribute. Also prevent switching into an expression or other property value that allows scripting. If attribute is quoted, breaking out requires the corresponding quote. All attributes should be quoted. Unquoted attributes can be broken out of with many characters including space % * +, - / ; < = > ^ | could break out.  Also, the tag is also likely to close the style block even though it is inside a quoted string because the HTML parser runs before the JavaScript parser.

See the ESAPI reference implementation of CSS escaping.

String safe = ESAPI.encoder.encodeForCSS( request.getParameter( "input" ) );

RULE #5 - URL Escape Before Inserting Untrusted Data into HTML URL Attributes
Rule #5 is for when you want to put untrusted data into a link to another location. This includes href and src attributes. There are a few other location attributes, but we recommend against using untrusted data in them. One important note is that using untrusted data in javascript: urls is a very bad idea, but you could possibly use the HTML JavaScript Data Value rule above.

&lt;a href=http://...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE...>link&lt;/a >        a normal link &lt;img src='http://...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE...' />           an image source &lt;script src="http://...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE..." />        a script source

Use %HH for all characters less than 256 except alphanumeric. Including untrusted data in data: urls should not be allowed as there is no good way to disable attacks with encoding to prevent switching out of the url. All attributes should be quoted. Unquoted attributes can be broken out of with many characters including space % * +, - / ; < = > ^ | could break out. Note that entity encoding is useless in this context.

See the ESAPI reference implementation of URL escaping and unescaping.

String safe = ESAPI.encoder.encodeForURL( request.getParameter( "input" ) );

= Encoding Information =

Coming soon...