Category:Encoding

Description
Encoding, closely related to Escaping is a powerful mechanism to help protect against many types of attack, especially injection attacks and XSS. Essentially, encoding involves translating special characters into some equivalent that is no longer significant in the target interpreter. So, for example, using HTML entity encoding before sending untrusted data into a browser will protect against many forms of XSS.

Considerations:


 * What interpreter?
 * To encode properly, you need to know what interpreters the data might end up in. For example, if the data is going into a SQL interpreter, you should consider encoding based on syntax of the SQL engine you are using.


 * What characters? Complete?
 * You want to make sure that you encode all the characters that might cause a problem, so the best approach is to use a positive encoding scheme, where all characters except a minimal known good set are encoded.


 * What encoding scheme?
 * There are dozens of ways to encode characters and many interpreters allow multiple forms of a single significant character. For a browser, HTML entity encoding is a good way to prevent script injection, but URL encoding or Unicode encoding (%xx) will not prevent scripts from running. Be sure to use the appropriate encoding scheme for the target interpreter.


 * Double encoding and decoding?
 * Be careful not to double encode your data. In some cases, doubly encoding data can inadvertently introduce special characters in the final output. Also, be aware that some processors may automatically undo your encoding. There is some evidence that XML processors are decoding HTML entity encoding, thus reintroducing potential XSS problems.

See the Category:OWASP Encoding Project for more information.

Examples
ASP.NET Example

ASP.NET pages have a "ValidateRequest" property which is set to true by default. This prevents XSS-type attacks in submitted form fields - ASP.NET will throw an exception if it detects script-type "unsafe" content in a request. Unfortunately this mechanism will be triggered in response to certain characters you may actually want to receive in a request, such as > and <.

To get around this, disable page validation for the page and use Server.HTMLEncode (and its URL equivalent URLEncode for GET operations) to encode user input: -

Related Countermeasures

 * Input Validation