Difference between revisions of "OWASP Java HTML Sanitizer Project"

From OWASP
Jump to: navigation, search
(Licensing)
(Creating a HTML Policy)
 
(73 intermediate revisions by 3 users not shown)
Line 10: Line 10:
 
The existing dependencies are on guava and JSR 305. The other jars are only needed by the test suite. The JSR 305 dependency is a compile-only dependency, only needed for annotations.
 
The existing dependencies are on guava and JSR 305. The other jars are only needed by the test suite. The JSR 305 dependency is a compile-only dependency, only needed for annotations.
 
This code was written with security best practices in mind, has an extensive test suite, and has undergone adversarial security review.
 
This code was written with security best practices in mind, has an extensive test suite, and has undergone adversarial security review.
A great place to get started using the OWASP Java HTML Sanitizer is here: https://code.google.com/p/owasp-java-html-sanitizer/wiki/GettingStarted.
+
A great place to get started using the OWASP Java HTML Sanitizer is here: https://github.com/OWASP/java-html-sanitizer/blob/master/docs/getting_started.md.
  
 
== Benefits ==
 
== Benefits ==
* Provides 4X the speed of [https://www.owasp.org/index.php/Category:OWASP_AntiSamy_Project AntiSamy] sanitization in DOM mode and 2X the speed of AntiSamy in SAX mode.
 
 
* Very easy to use. It allows for simple programmatic POSITIVE policy configuration (see below). No XML config.
 
* Very easy to use. It allows for simple programmatic POSITIVE policy configuration (see below). No XML config.
 
* Actively maintained by Mike Samuel from Google's AppSec team!
 
* Actively maintained by Mike Samuel from Google's AppSec team!
Line 19: Line 18:
 
* This is code from the Caja project that was donated by Google. It is rather high performance and low memory utilization.
 
* This is code from the Caja project that was donated by Google. It is rather high performance and low memory utilization.
 
* Java 1.5+
 
* Java 1.5+
 +
* Provides 4X the speed of [https://www.owasp.org/index.php/Category:OWASP_AntiSamy_Project AntiSamy] sanitization in DOM mode and 2X the speed of AntiSamy in SAX mode.
  
 
==Licensing==
 
==Licensing==
The OWASP HTML Sanitizer is free to use under the [http://www.apache.org/licenses/LICENSE-2.0 Apache 2 License].
+
The OWASP HTML Sanitizer is free to use and is dual licensed under the [http://www.apache.org/licenses/LICENSE-2.0 Apache 2 License] and the [http://opensource.org/licenses/BSD-3-Clause New BSD License].
  
 
| valign="top"  style="padding-left:25px;width:200px;border-right: 1px dotted gray;padding-right:25px;" |
 
| valign="top"  style="padding-left:25px;width:200px;border-right: 1px dotted gray;padding-right:25px;" |
Line 27: Line 27:
 
== What is this? ==
 
== What is this? ==
  
The OWASP HTML Sanitizer Projects provides:
+
The OWASP HTML Sanitizer Projects provides Java based HTML sanitization of untrusted HTML!
 
+
* Java based HTML sanitization of untrusted HTML!
+
  
 
== Code Repo ==
 
== Code Repo ==
  
[https://code.google.com/p/owasp-java-html-sanitizer/ OWASP HTML Sanitizer at Google Code]
+
[https://github.com/owasp/java-html-sanitizer OWASP HTML Sanitizer at GitHub]
  
 
== Email List ==
 
== Email List ==
  
[https://groups.google.com/forum/#!forum/owasp-java-html-sanitizer-support Project Email List ]
+
Questions? Please sign up for our [https://groups.google.com/forum/#!forum/owasp-java-html-sanitizer-support Project Support List ]
  
== Project Leader ==
+
== Project Leaders ==
  
Project Leader:<br/>Mike Samuel  
+
Author/Project Leader<br/>[https://www.owasp.org/index.php/User:Mike_Samuel Mike Samuel] [mailto:mikesamuel@gmail.com @]<br/><br/>
<br/><br/>
+
Project Manager<br/>[https://www.owasp.org/index.php/User:Jmanico Jim Manico] [mailto:jim.manico@owasp.org @]
Contributors: <br/>
+
Jim Manico<br/>
+
  
 
== Related Projects ==
 
== Related Projects ==
Line 51: Line 47:
 
* [[OWASP JSON Sanitizer]]
 
* [[OWASP JSON Sanitizer]]
 
* [[OWASP Java Encoder Project]]
 
* [[OWASP Java Encoder Project]]
 +
* [[OWASP Dependency Check]]
 
* [https://www.owasp.org/index.php/Category:OWASP_AntiSamy_Project OWASP AntiSamy]  
 
* [https://www.owasp.org/index.php/Category:OWASP_AntiSamy_Project OWASP AntiSamy]  
 +
* [https://github.com/sourceclear/headlines Sourceclear Headlines]
 +
* [https://github.com/google/keyczar Google KeyCzar]
 +
* [http://shiro.apache.org/ Apache SHIRO]
 +
 +
== Ohloh ==
 +
 +
*https://www.ohloh.net/p/owasp-java-html-sanitizer
  
 
| valign="top"  style="padding-left:25px;width:200px;" |
 
| valign="top"  style="padding-left:25px;width:200px;" |
Line 57: Line 61:
 
== Quick Download ==
 
== Quick Download ==
  
* [https://code.google.com/p/owasp-java-html-sanitizer/downloads/detail?name=owasp-java-html-sanitizer-r209.zip https://code.google.com/p/owasp-java-html-sanitizer/downloads/detail?name=owasp-java-html-sanitizer-r209.zip]
+
[https://search.maven.org/#search%7Cga%7C1%7Cowasp%20html%20sanitizer OWASP HTML Sanitizer at Maven Central]<br/>
  
 
== News and Events ==
 
== News and Events ==
 +
* [28 June 2016] v20160628.1 Released
 +
* [14 Apr 2016] v20160413.1 Released
 +
* [1 May 2015] Move to GitHub
 +
* [2 July 2014] v239 Released
 +
* [3 Mar 2014] v226 Released
 
* [5 Feb 2014] New Wiki
 
* [5 Feb 2014] New Wiki
* [4 Sept 2013] 209 Released
+
* [4 Sept 2013] v209 Released
 +
 
 +
== Change Log ==
 +
For recent release notes, please visit the [https://github.com/OWASP/java-html-sanitizer/blob/master/change_log.md changelog on GitHub].
  
 
==Classifications==
 
==Classifications==
Line 81: Line 93:
 
= Creating a HTML Policy =
 
= Creating a HTML Policy =
  
You can use prepackaged policies here: [http://owasp-java-html-sanitizer.googlecode.com/svn/trunk/distrib/javadoc/org/owasp/html/Sanitizers.html http://owasp-java-html-sanitizer.googlecode.com/svn/trunk/distrib/javadoc/org/owasp/html/Sanitizers.html].
+
You can view a few basic prepackaged policies for links, tables, integers, images and more here: [https://github.com/OWASP/java-html-sanitizer/blob/master/src/main/java/org/owasp/html/Sanitizers.java https://github.com/OWASP/java-html-sanitizer/blob/master/src/main/java/org/owasp/html/Sanitizers.java].
  
 
  PolicyFactory policy = Sanitizers.FORMATTING.and(Sanitizers.LINKS);
 
  PolicyFactory policy = Sanitizers.FORMATTING.and(Sanitizers.LINKS);
 
  String safeHTML = policy.sanitize(untrustedHTML);
 
  String safeHTML = policy.sanitize(untrustedHTML);
  
or the tests show how to configure your own policy here: [http://code.google.com/p/owasp-java-html-sanitizer/source/browse/trunk/src/tests/org/owasp/html/HtmlPolicyBuilderTest.java http://code.google.com/p/owasp-java-html-sanitizer/source/browse/trunk/src/tests/org/owasp/html/HtmlPolicyBuilderTest.java]
+
There tests illustrate how to configure your own policy here: [https://github.com/OWASP/java-html-sanitizer/blob/master/src/test/java/org/owasp/html/HtmlPolicyBuilderTest.java https://github.com/OWASP/java-html-sanitizer/blob/master/src/test/java/org/owasp/html/HtmlPolicyBuilderTest.java]
  
 
  PolicyFactory policy = new HtmlPolicyBuilder()
 
  PolicyFactory policy = new HtmlPolicyBuilder()
Line 96: Line 108:
 
  String safeHTML = policy.sanitize(untrustedHTML);
 
  String safeHTML = policy.sanitize(untrustedHTML);
  
or you can write custom policies to do things like changing h1s to divs with a certain class:
+
... or you can write custom policies ...
  
 
  PolicyFactory policy = new HtmlPolicyBuilder()
 
  PolicyFactory policy = new HtmlPolicyBuilder()
Line 110: Line 122:
 
     .build();
 
     .build();
 
  String safeHTML = policy.sanitize(untrustedHTML);
 
  String safeHTML = policy.sanitize(untrustedHTML);
 +
 +
Please note that the elements "a", "font", "img", "input" and "span" need to be explicitly whitelisted
 +
using the `allowWithoutAttributes()` method if you want them to be allowed through the filter when
 +
these elements do not include any attributes.
 +
 +
You can also use the default "ebay" and "slashdot" policies. The Slashdot policy (defined here https://github.com/OWASP/java-html-sanitizer/blob/master/src/main/java/org/owasp/html/examples/SlashdotPolicyExample.java) allows the following tags ("a", "p", "div", "i", "b", "em", "blockquote", "tt", "strong"n "br", "ul", "ol", "li") and only certain attributes. This policy also allows for the custom slashdot tags, "quote" and "ecode".
 +
 +
= Inline/Embedded Images =
 +
 +
Inline images use the data URI scheme to embed images directly within web pages. The following describes how to allow inline images in an HTML Sanitizer policy.
 +
 +
1) Add the "data" protocol do your whitelist. See: https://static.javadoc.io/com.googlecode.owasp-java-html-sanitizer/owasp-java-html-sanitizer/20160628.1/org/owasp/html/HtmlPolicyBuilder.html#allowUrlProtocols
 +
 +
.allowUrlProtocols("data")
 +
 +
2) You can then allow an attribute with an extra check thus
 +
 +
.allowAttributes("src")
 +
.matching(...)
 +
.onElements("img")
 +
 +
3) There are a number of things you can do in the matching part such as allow the following instead of just allowing data.
 +
 +
data:image/...
 +
 +
4) Since allowUrlProtocols("data") allows data URLs anywhere data URLs are allowed, you might want to also add a matcher to any other URL attributes  that reject anything with a colon that does not start with http: or https: or mailto&#58;
 +
 +
.allowAttributes("href")
 +
.matching(...)
 +
.onElements("a")
  
 
= Questions =
 
= Questions =
  
*How was this project tested?
+
<b>How was this project tested?</b><br/>
**This code was written with security best practices in mind, has an extensive test suite, and has undergone [https://code.google.com/p/owasp-java-html-sanitizer/wiki/AttackReviewGroundRules adversarial security review].
+
This code was written with security best practices in mind, has an extensive test suite, and has undergone [https://github.com/OWASP/java-html-sanitizer/blob/master/docs/attack_review_ground_rules.md adversarial security review].
*How is this project deployed?
+
 
**This project is best deployed through Maven [https://code.google.com/p/owasp-java-html-sanitizer/wiki/Maven https://code.google.com/p/owasp-java-html-sanitizer/wiki/Maven]
+
<b>How is this project deployed?</b><br/>
 +
This project is best deployed through Maven [https://github.com/OWASP/java-html-sanitizer/blob/master/docs/getting_started.md https://github.com/OWASP/java-html-sanitizer/blob/master/docs/getting_started.md]
 +
 
 +
= Roadmap =
  
= About =
+
* Maintaining a fully featured HTML sanitizer is a lot of work. We intend to continue to handle community questions and bug reports in a very timely manner.
{{:Projects/OWASP Java HTML Sanitizer Project | Project About}}
+
* There are no plans for major new features other than supporting incoming requests for advanced sanitization such as additional HTML5 support.
  
 
__NOTOC__ <headertabs/>
 
__NOTOC__ <headertabs/>

Latest revision as of 16:32, 28 July 2016

[edit]

OWASP Project Header.jpg

OWASP HTML Sanitizer Project

The OWASP HTML Sanitizer is a fast and easy to configure HTML Sanitizer written in Java which lets you include HTML authored by third-parties in your web application while protecting against XSS. The existing dependencies are on guava and JSR 305. The other jars are only needed by the test suite. The JSR 305 dependency is a compile-only dependency, only needed for annotations. This code was written with security best practices in mind, has an extensive test suite, and has undergone adversarial security review. A great place to get started using the OWASP Java HTML Sanitizer is here: https://github.com/OWASP/java-html-sanitizer/blob/master/docs/getting_started.md.

Benefits

  • Very easy to use. It allows for simple programmatic POSITIVE policy configuration (see below). No XML config.
  • Actively maintained by Mike Samuel from Google's AppSec team!
  • Passing 95+% of AntiSamy's unit tests plus many more.
  • This is code from the Caja project that was donated by Google. It is rather high performance and low memory utilization.
  • Java 1.5+
  • Provides 4X the speed of AntiSamy sanitization in DOM mode and 2X the speed of AntiSamy in SAX mode.

Licensing

The OWASP HTML Sanitizer is free to use and is dual licensed under the Apache 2 License and the New BSD License.

What is this?

The OWASP HTML Sanitizer Projects provides Java based HTML sanitization of untrusted HTML!

Code Repo

OWASP HTML Sanitizer at GitHub

Email List

Questions? Please sign up for our Project Support List

Project Leaders

Author/Project Leader
Mike Samuel @

Project Manager
Jim Manico @

Related Projects

Ohloh

Quick Download

OWASP HTML Sanitizer at Maven Central

News and Events

  • [28 June 2016] v20160628.1 Released
  • [14 Apr 2016] v20160413.1 Released
  • [1 May 2015] Move to GitHub
  • [2 July 2014] v239 Released
  • [3 Mar 2014] v226 Released
  • [5 Feb 2014] New Wiki
  • [4 Sept 2013] v209 Released

Change Log

For recent release notes, please visit the changelog on GitHub.

Classifications

Owasp-incubator-trans-85.png Owasp-builders-small.png
Owasp-defenders-small.png
Apache 2 License
Project Type Files CODE.jpg

You can view a few basic prepackaged policies for links, tables, integers, images and more here: https://github.com/OWASP/java-html-sanitizer/blob/master/src/main/java/org/owasp/html/Sanitizers.java.

PolicyFactory policy = Sanitizers.FORMATTING.and(Sanitizers.LINKS);
String safeHTML = policy.sanitize(untrustedHTML);

There tests illustrate how to configure your own policy here: https://github.com/OWASP/java-html-sanitizer/blob/master/src/test/java/org/owasp/html/HtmlPolicyBuilderTest.java

PolicyFactory policy = new HtmlPolicyBuilder()
   .allowElements("a")
   .allowUrlProtocols("https")
   .allowAttributes("href").onElements("a")
   .requireRelNofollowOnLinks()
   .build();
String safeHTML = policy.sanitize(untrustedHTML);

... or you can write custom policies ...

PolicyFactory policy = new HtmlPolicyBuilder()
   .allowElements("p")
   .allowElements(
       new ElementPolicy() {
         public String apply(String elementName, List<String> attrs) {
           attrs.add("class");
           attrs.add("header-" + elementName);
           return "div";
         }
       }, "h1", "h2", "h3", "h4", "h5", "h6"))
   .build();
String safeHTML = policy.sanitize(untrustedHTML);

Please note that the elements "a", "font", "img", "input" and "span" need to be explicitly whitelisted using the `allowWithoutAttributes()` method if you want them to be allowed through the filter when these elements do not include any attributes.

You can also use the default "ebay" and "slashdot" policies. The Slashdot policy (defined here https://github.com/OWASP/java-html-sanitizer/blob/master/src/main/java/org/owasp/html/examples/SlashdotPolicyExample.java) allows the following tags ("a", "p", "div", "i", "b", "em", "blockquote", "tt", "strong"n "br", "ul", "ol", "li") and only certain attributes. This policy also allows for the custom slashdot tags, "quote" and "ecode".

Inline images use the data URI scheme to embed images directly within web pages. The following describes how to allow inline images in an HTML Sanitizer policy.

1) Add the "data" protocol do your whitelist. See: https://static.javadoc.io/com.googlecode.owasp-java-html-sanitizer/owasp-java-html-sanitizer/20160628.1/org/owasp/html/HtmlPolicyBuilder.html#allowUrlProtocols

.allowUrlProtocols("data")

2) You can then allow an attribute with an extra check thus

.allowAttributes("src")
.matching(...)
.onElements("img")

3) There are a number of things you can do in the matching part such as allow the following instead of just allowing data.

data:image/...

4) Since allowUrlProtocols("data") allows data URLs anywhere data URLs are allowed, you might want to also add a matcher to any other URL attributes that reject anything with a colon that does not start with http: or https: or mailto:

.allowAttributes("href")
.matching(...)
.onElements("a")

How was this project tested?
This code was written with security best practices in mind, has an extensive test suite, and has undergone adversarial security review.

How is this project deployed?
This project is best deployed through Maven https://github.com/OWASP/java-html-sanitizer/blob/master/docs/getting_started.md

  • Maintaining a fully featured HTML sanitizer is a lot of work. We intend to continue to handle community questions and bug reports in a very timely manner.
  • There are no plans for major new features other than supporting incoming requests for advanced sanitization such as additional HTML5 support.