Category:OWASP AntiSamy Project
- 1 What is it?
- 2 Who are you?
- 3 What's the difference between AntiSamy Java, .NET, etc.?
- 4 How do I get started?
- 5 Project roadmap
- 6 Presentations on AntiSamy
- 7 Contacting us
- 8 Sponsors
- 9 Project's Assessment
What is it?
Philosophically, AntiSamy is a departure from all contemporary security mechanisms. Generally, the security mechanism and user have a communication that is virtually one way, for good reason. Letting the potential attacker know details about the validation is considered unwise as it allows the attacker to "learn" and "recon" the mechanism for weaknesses. These types of information leaks can also hurt in ways you don't expect. A login mechanism that tells the user, "Username invalid" leaks the fact that a user by that name does not exist. A user could use a dictionary or phone book or both to remotely come up with a list of valid usernames. Using this information, an attacker could launch a brute force attack or massive account lock denial-of-service. So, we get that.
Unfortunately, that's just not very usable in this situation. Typical Internet users are largely ineffective when it comes to writing HTML/CSS, so where do they get their HTML from? Usually they copy it from somewhere out on the web. Simply rejecting their input without any clue as to why is jolting and annoying. Annoyed users go somewhere else to do their social networking.
Socioeconomically, AntiSamy is a have-not enabler. Private companies like Google, MySpace, eBay, etc. have come up with proprietary solutions for solving this problem. This introduces two problems. One is that proprietary solutions are not usually all that good, and even if they are, well - naturally they're reluctant to share this hard-earned IP for free. Fortunately, we just don't care. We don't see any reason why only these private companies should have this functionality, so I'm releasing this for free.
The OWASP licensing policy (further explained in the membership FAQ) allows OWASP projects to be released under any approved open source license. Under these guidelines, AntiSamy is distributed under a BSD license.
Who are you?
AntiSamy was originally authored by Arshan Dabirsiaghi (arshan.dabirsiaghi [at the] gmail.com) with help from Jason Li (li.jason.c [at the] gmail.com), both of Aspect Security (http://www.aspectsecurity.com/). The problem AntiSamy solves was often described as "impossible" or "impossible to do right". The folks with the AntiSamy project hope to antiquate that idea in a hurry. As of now, there are Java and .NET implementations of AntiSamy, though the framework is implementable in any language. The Java version is callable from ColdFusion. HTMLPurifier, another free tool, is a PHP utility similar to AntiSamy and is our official suggestions for PHP. There has not been much interest in this project from the Rails community, so no implementation for Rails is being planned.
What's the difference between AntiSamy Java, .NET, etc.?
This page shows a big-picture comparison between the versions. Since it's an unfunded open source project, the ports can't be expected to mirror functionality exactly. If there's something a port is missing -- let us know, and we'll try to accommodate, or write a patch!
How do I get started?
There's 4 steps in the process of integrating AntiSamy. Each step is detailed in the next section, but the high level overview follows:
- Download AntiSamy from its home on Google Code
- Choose one of the standard policy files that matches as close to the functionality you need:
- Tailor the policy file according to your site's rules
- Call the API from the code
Stage 1 - Downloading AntiSamy
The following instructions are for AntiSamy Java, the main version. For instructions on the .NET version, see the .NET page.
Which package you download depends on what you want to do with AntiSamy. If you'd like to extend it or review the code, download the source package. If you're looking to integrate AntiSamy, you can either download the library or use Maven to include it in your build. If you want to use Maven, here's an example POM for including AntiSamy. If you want a jar file, then download the antisamy-bin-X.X.X.jar (which, before version 1.2 was confusingly called "antisamy-standalone-X.X.X.jar"), which only contains AntiSamy library. This will be the preferred choice for mature enterprise environments who don't want to be caught in classpath issues which may be introduced by the current version.
The second option versions before 1.2 is downloading antisamy-standalone-X.X.X.jar, which contains not only the AntiSamy code, but all necessary supporting libraries. This should only be used by applications that don't use the libraries AntiSamy ships with as they might introduce classpath and versioning issues.
For convenience, the download page also contains the necessary libraries for running AntiSamy in antisamy-required-libs.zip.
You can Download AntiSamy from its home on Google Code
Stage 2 - Choosing a base policy file
Chances are that your site's use case for AntiSamy is at least roughly comparable to one of the predefined policy files. They each represent a "typical" scenario for allowing users to provide HTML (and possibly CSS) formatting information. Let's look into the different policy files:
Slashdot (http://www.slashdot.org/) is a techie news site that allows users to respond anonymously to news posts with very limited HTML markup. Now Slashdot is not only one of the coolest sites around, it's also one that's been subject to many different successful attacks. Even more unfortunate is the fact that most of the attacks led users to the infamous goatse.cx picture (please don't go look it up). The rules for Slashdot are fairly strict: users can only submit the following HTML tags and no CSS: <b>, <u>, <i>, <a>, <blockquote>.
Accordingly, we've built a policy file that allows fairly similar functionality. All text-formatting tags that operate directly on the font, color or emphasis have been allowed.
eBay (http://www.ebay.com/) is the most popular online auction site in the universe, as far as I can tell. It is a public site so anyone is allowed to post listings with rich HTML content. It's not surprising that given the attractiveness of eBay as a target that it has been subject to a few complex XSS attacks. Listings are allowed to contain much more rich content than, say, Slashdot- so it's attack surface is considerably larger. The following tags appear to be accepted by eBay (they don't publish rules): <a>,...
Stage 3 - Tailoring the policy file
Smaller organizations may want to deploy AntiSamy in a default configuration, but it's equally likely that a site may want to have strict, business-driven rules for what users can allow. The discussion that decides the tailoring should also consider attack surface - which grows in relative proportion to the policy file.
You may also want to enable/modify some "directives", which are basically advanced user options. This page tells you what the directives are and which versions support them.
Stage 4 - Calling the AntiSamy API
Using AntiSamy is abnormally easy. Here is an example of invoking AntiSamy with a policy file:
import org.owasp.validator.html.*; Policy policy = Policy.getInstance(POLICY_FILE_LOCATION); AntiSamy as = new AntiSamy(); CleanResults cr = as.scan(dirtyInput, policy); MyUserDAO.storeUserProfile(cr.getCleanHTML()); // some custom function
There are a few ways to create a Policy object. The
getInstance() method can take any of the following:
- a String filename
- a File object
- an InputStream
Policy files can also be referenced by filename by passing a second argument to the
AntiSamy:scan() method as the following examples show.:
AntiSamy as = new AntiSamy(); CleanResults cr = as.scan(dirtyInput, policyFilePath);
Finally, policy files can also be referenced by File objects directly in the second parameter:
AntiSamy as = new AntiSamy(); CleanResults cr = as.scan(dirtyInput, new File(policyFilePath));
Stage 4 - Analyzing CleanResults
The CleanResults object provides a lot of useful stuff.
getErrorMessages() - a list of
String error messages
getCleanHTML() - the clean, safe HTML output
getCleanXMLDocumentFragment() - the clean, safe
XMLDocumentFragment which is reflected in
getScanTime() - returns the scan time in seconds
This is a labor of love, so the upgrade process may be achingly slow at times. This section details port roadmaps.
The .NET version of AntiSamy is available now at the OWASP AntiSamy .NET page. The project was funded by a Summer of Code 2008 grant and was developed primarily by Jerry Hoff with oversight from Arshan Dabirsiaghi.
A beta Python version is currently being prototyped. As more information becomes available, we will post it here. If you are interesting in helping, please email me (arshan.dabirsiaghi [at the] gmail.com).
PHP version (no plans)
Although a PHP version was initially planned, we now suggest HTMLPurifier for safe rich input validation for PHP applications.
Presentations on AntiSamy
From OWASP & WASC AppSec U.S. 2007 Conference (San Jose, CA): AntiSamy - Picking a Fight with XSS (ppt) - By Arshan Dabirsiaghi - AntiSamy project lead
From OWASP AppSec Europe 2008 (Ghent, Belgium): The OWASP AntiSamy project (ppt) - By Jason Li - AntiSamy project contributor
From OWASP AppSec India 2008 (Delhi, India): Validating Rich User Content (ppt) - By Jason Li - AntiSamy project contributor
There are two ways of getting information on AntiSamy. The mailing list, and contacting the project lead directly.
OWASP AntiSamy mailing list
The first is the mailing list which is located at https://lists.owasp.org/mailman/listinfo/owasp-antisamy. The list was previously private and the archives have been cleared with the release of version 1.0. We encourage all prospective and current users and bored attackers to join in the conversation. We're happy to brainstorm attack scenarios, discuss regular expressions and help with integration.
Emailing the project lead
For content which is not appropriate for the public mailing list, you can alternatively contact the project lead, Arshan Dabirsiaghi, at [arshan.dabirsiaghi] at [aspectsecurity.com] (s/ at the /@/).
Visit the Google Code issue tracker.