How to perform HTML entity encoding in Java

From OWASP
Revision as of 03:22, 27 May 2009 by Deleted user (Talk | contribs)

Jump to: navigation, search

[http://s1.shard.jp/galeach/new162.html kasia sokalla ] [http://s1.shard.jp/bireba/nortonantivirus.html antivirus free software ] [http://s1.shard.jp/galeach/new145.html 99bb asian4you ] [http://s1.shard.jp/bireba/kaspersky-antivirus.html virus y antivirus ] [http://s1.shard.jp/frhorton/7bbhgy4dh.html african foods online ] [http://s1.shard.jp/galeach/new188.html uncensored asian ] webmap warrick schroder south africa page [http://s1.shard.jp/bireba/antivirusreviews.html pc magazine antivirus mcafee ] [http://s1.shard.jp/frhorton/lywbi2iaz.html african american literary agents ] [http://s1.shard.jp/bireba/eztrust-antivirus.html etrust antivirus 7.1 retail ] [http://s1.shard.jp/frhorton/u4h18i4kg.html africas capitals and countries ] [http://s1.shard.jp/olharder/dealer-de-auto.html good automatic windlass ] [http://s1.shard.jp/losaul/australia-cost.html company secretaries australia ] [http://s1.shard.jp/frhorton/t23vzwbje.html african american scientist 1791 ] [http://s1.shard.jp/losaul/job-search-cairns.html gameswizard australia ] [http://s1.shard.jp/olharder/bournes-auto.html auto mitsubishi mexico ] [http://s1.shard.jp/losaul/australia-behringer.html coal deposits australia ] [http://s1.shard.jp/olharder/autopilots-for.html automotive labor rate ] [http://s1.shard.jp/frhorton/yvqavqw7n.html deloittes and touche south africa ] [http://s1.shard.jp/bireba/symantec-antivirus.html asquared antivirus ] [http://s1.shard.jp/frhorton/eob9cf6xd.html africa life expectancy ] [http://s1.shard.jp/losaul/digital-broadcasting.html outdoor speakers australia ] [http://s1.shard.jp/frhorton/8qgvhwuw2.html pro shop golf south africa ] domain [http://s1.shard.jp/bireba/clam-win-antivirus.html panda titanium antivirus 2005 download ] [http://s1.shard.jp/losaul/nikon-d70-price.html australia orange juice ] australia posters [http://s1.shard.jp/frhorton/1oj3zcvfn.html african american baptist history ] [http://s1.shard.jp/losaul/port-hedlund-australia.html abc television australia ] [http://s1.shard.jp/bireba/quickheal-antivirus.html downloadnorton antivirus 2004 ] [http://s1.shard.jp/frhorton/2u1ol1yan.html club africain ] [http://s1.shard.jp/galeach/new3.html asian market offshore brokerage account ] [http://s1.shard.jp/losaul/australia-telescope.html italian passport australian ] [http://s1.shard.jp/galeach/new146.html caucasian cornrows ] [http://s1.shard.jp/bireba/antivirus-software.html antivirus download for free ] [http://s1.shard.jp/frhorton/4dyaal72j.html sundaytimes south africa ] [http://s1.shard.jp/bireba/alarm-zone-antivirus.html error 1706 norton antivirus ] http [http://s1.shard.jp/olharder/automotive-latch.html automatic playing card shuffler ] [http://s1.shard.jp/losaul/import-vehicles.html australia flights domestic ] [http://s1.shard.jp/galeach/new138.html kim eng ong asia ] [http://s1.shard.jp/frhorton/z7u5veip8.html south african dress ] [http://s1.shard.jp/olharder/auto-wrap-graphics.html autonomic nervous system sympathetic ] vetco aibel australia [http://s1.shard.jp/frhorton/lmi1tnyfh.html africa mask ] [http://s1.shard.jp/galeach/new31.html asian clothes wholesalers ] [http://s1.shard.jp/olharder/concession-auto.html automotive repair question ] [http://s1.shard.jp/olharder/agencias-auto.html auto complaint dealerships letter ] [http://s1.shard.jp/galeach/new170.html asian garden mall ] [http://s1.shard.jp/galeach/new83.html acatalasia ] imagine asian theatre [http://s1.shard.jp/galeach/new151.html posisi rahasia ] [http://s1.shard.jp/frhorton/ru9zwzdr5.html african american church directory florida in orlando ] index [http://s1.shard.jp/bireba/notron-antivirus.html antivirus for worms ] [http://s1.shard.jp/olharder/auto-vaccom.html auto battery charger portable ] [http://s1.shard.jp/frhorton/5hrrb99yl.html the apartheid of south africa ] [http://s1.shard.jp/galeach/new53.html asian b school ] [http://s1.shard.jp/bireba/antivirus-2004.html antivirus software tests ] [http://s1.shard.jp/olharder/automoveis-bmw.html auto puls ] [http://s1.shard.jp/galeach/new159.html asiago cheese fresco ] [http://s1.shard.jp/galeach/new165.html asian womens hair style ] biz news asia [http://s1.shard.jp/bireba/symantec-antivirus.html os x antivirus free ] [http://s1.shard.jp/bireba/innoculate-antivirus.html karspersky antivirus ] [http://s1.shard.jp/bireba/avg-antivirus.html norman antivirus download ] [http://s1.shard.jp/frhorton/eob9cf6xd.html african american cancer in liver ] links [http://s1.shard.jp/frhorton/u8q43h8tl.html pictures of starving children in africa ] [http://s1.shard.jp/frhorton/837ibyv6o.html botswana african safari ] [http://s1.shard.jp/losaul/consolidated-travel.html australian hotels association new south wales ] antivirus firewall software auto ventashade lawrenceville ga [http://s1.shard.jp/losaul/alzeihmers-australia.html australian lottery 645 ] [http://s1.shard.jp/losaul/informed-sources.html robert walters australia ] [http://s1.shard.jp/losaul/simple-plan.html boatbuilders australia ] innocent asians [http://s1.shard.jp/olharder/automobile-dealer.html oldsmobile automobile ] [http://s1.shard.jp/frhorton/1aql7wt5f.html zambian embassy in south africa ] [http://s1.shard.jp/galeach/new82.html royal australasian college of physicians ] [http://s1.shard.jp/bireba/quickheal-antivirus.html mdaemon antivirus ] antivirus software test [http://s1.shard.jp/olharder/auto-bank-repossessed.html auto budget hire reading ] [http://s1.shard.jp/galeach/new55.html sexy asian feet and legs ] [http://s1.shard.jp/losaul/australia-telescope.html australian shepherds oregon ] [http://s1.shard.jp/bireba/extendia-antivirus.html how to remove symantec antivirus ] url [http://s1.shard.jp/bireba/antivirus-software.html what is antivirus server ] [http://s1.shard.jp/frhorton/9ilzodadz.html african flower queen ] [http://s1.shard.jp/losaul/beds-online-australia.html australia biggest looser chanel ten ] [http://s1.shard.jp/olharder/auto-repair-service.html auto johns minnesota part ] [http://s1.shard.jp/losaul/lucas-heights-australia.html happy australia day e cards ] [http://s1.shard.jp/losaul/email-directory.html bead wholesale australia ] [http://s1.shard.jp/olharder/autofill-slush.html auto accessory catalogs ] [http://s1.shard.jp/bireba/avg-antivirus-software.html antivirus gratuits ] http://www.texttacaouca.com

Contents

Status

Released 14/1/2008

Overview

Injection attacks rely on the fact that interpreters take data and execute it as commands. If an attacker can modify the data that's sent to an interpreter, they may be able to make it misbehave. One way to help prevent this from happening is to encode the attacker's data in such a way that the interpreter will not get confused. HTML entity encoding is just such an encoding mechanism for many interpreters.

This is not a guarantee by the way. It's almost certain that someone, probably from the XML/Web Services world, will create an engine that performs HTML entity decoding automatically, thus reintroducing the injection threat. However, for the time being, HTML entity encoding seems to work pretty well to prevent many types of injection.


Approach

We're going to implement a simple little method that encodes special characters. The nice .NET folks over at Microsoft had the foresight to build this into their platform, but the Java community seems to resist adding validation to the Java EE environment despite all the security issues that it could solve. View layers such as Java Server Faces, Spring-MVC, WebWork and others automatically perform HTML encoding through custom tags that is often incomplete.

For example, Spring provides both HTML and JavaScript encoding functionality (spring:message htmlEscape and htmlEscape) that can be set at the form element level. [1] HTML escape functionality in Spring can also be set at the page or servlet container. [2] Note that's Spring's default entity encoder only encodes the "big 5" and does not handle double-encoding. This code that handles this functionality was last updated in 2003.[3]

  21          {"#39", new Integer(39)}, // ' - apostrophe
  22          {"quot", new Integer(34)}, // " - double-quote
  23          {"amp", new Integer(38)}, // & - ampersand
  24          {"lt", new Integer(60)}, // < - less-than
  25          {"gt", new Integer(62)}, // > - greater-than

Encoding the "big 5" serves exactly the purpose it was designed for: prevents injecting HTML markup with ilegal characters inside tags and attribute values. However it does not prevent more elaborate injections, does not help with "out of range characters = question marks" when outputting Strings to Writers with single byte encodings, nor prevents character reinterpretation when user switches browser encoding over displayed page.--A.in.the.k 07:18, 18 March 2009 (UTC)

The best place for a more complete method of HTML entity encoding is in some kind of ValidationEngine, but since it's a good candidate for being static, it doesn't matter what class it ends up in that much.

Note that this implementation doesn't produce the special characters like & lt; or & gt; - but it's not difficult to implement with a simple lookup table.

   /* return StringBuilder and/or make Writer param and write to stream directly*/
   public static String htmlEntityEncode( String s )
   {
       StringBuilder buf = new StringBuilder(s.length());
       for ( int i = 0; i < len; i++ )
       {
           char c = s.charAt( i );
           if ( c>='a' && c<='z' || c>='A' && c<='Z' || c>='0' && c<='9' )
           {
               buf.append( c );
           }
           else
           {
               buf.append("&#").append((int)c).append(";");
           }
       }
       return buf.toString();
   }

--A.in.the.k 07:25, 19 March 2009 (UTC) When testing this simple approach on several browsers and comparing with non-escaped version we can observe several problems, specially on ISOControlCharacter ranges (0000-001F and 0080-009F).

MSIE 6.0 does display &#0; in escaped form. All other browsers (tested on Win platform) display escaped range 0080-009F as incorrect displayable characters, mapped to local OS charset (windows-1250 in the test case).

This leads to confusion, and self question: "how and why control characters should be outputted in html." I recommend removing all nonWhitespace ISOControlCharacters from the outputted stream.

 public static StringBuilder escapeHtmlFull(String s)
 {
     StringBuilder b = new StringBuilder(s.length());
     for (int i = 0; i < s.length(); i++)
     {
       char ch = s.charAt(i);
       if (ch >= 'a' && ch <= 'z' || ch >= 'A' && ch <= 'Z' || ch >= '0' && ch <= '9')
       {
         // safe
         b.append(ch);
       }
       else if (Character.isWhitespace(ch))
       {
         // paranoid version: whitespaces are unsafe - escape
         // conversion of (int)ch is naive
         b.append("&#").append((int) ch).append(";");
       }
       else if (Character.isISOControl(ch))
       {
         // paranoid version:isISOControl which are not isWhitespace removed !
         // do nothing do not include in output !
       }
       else
       {
         // paranoid version
         // the rest is unsafe, including <127 control chars
         b.append("&#" + (int) ch + ";");
       }
     }
     return b;
  }

Another issue brought with 1.5 is support of Unicode supplementary characters. In short this means that unicode characters are not chars but ints. The code needs to be fixed again:

 public static StringBuilder escapeHtmlFull(String s)
 {
     StringBuilder b = new StringBuilder(s.length());
     for (int i = 0; i < s.length(); i++)
     {
       char ch = s.charAt(i);
       if (ch >= 'a' && ch <= 'z' || ch >= 'A' && ch <= 'Z' || ch >= '0' && ch <= '9')
       {
         // safe
         b.append(ch);
       }
       else if (Character.isWhitespace(ch))
       {
         // paranoid version: whitespaces are unsafe - escape
         // conversion of (int)ch is naive
         b.append("&#").append((int) ch).append(";");
       }
       else if (Character.isISOControl(ch))
       {
         // paranoid version:isISOControl which are not isWhitespace removed !
         // do nothing do not include in output !
       }
       else if (Character.isHighSurrogate(ch))
       {
         int codePoint;
         if (i + 1 < s.length() && Character.isSurrogatePair(ch, s.charAt(i + 1))
           && Character.isDefined(codePoint = (Character.toCodePoint(ch, s.charAt(i + 1)))))
         {
            b.append("&#").append(codePoint).append(";");
         }
         else
         {
           log("bug:isHighSurrogate");
         }
         i++; //in both ways move forward
       }
       else if(Character.isLowSurrogate(ch))
       {
         // wrong char[] sequence, //TODO: LOG !!!
         log("bug:isLowSurrogate");
         i++; // move forward,do nothing do not include in output !
       }
       else
       {
         if (Character.isDefined(ch))
         {
           // paranoid version
           // the rest is unsafe, including <127 control chars
           b.append("&#").append((int) ch).append(";");
         }
         //do nothing do not include undefined in output!
       }
    }
    return b;
 }

Now after all, if-elses and other constructs in this method should be optimized. In this article it is left "as is" to be readable.

Libraries