OWASP Code Review Guide Table of Contents

Introduction
Preface: This document is not a “How to perform a Secure Code review” walkthrough but more a guide on how to perform a successful review. Knowing the mechanics of code inspection is a half the battle but I’m afraid people is the other half. To Perform a proper code review, to give value to the client from a risk perspective and not from an academic or text book perspective we must understand what we are reviewing.

Applications may have faults but the client wants to know the “real risk” and not necessarily what the security textbooks say.

Albeit there are real vulnerabilities in real applications out there and they pose real risk but how do we define real risk as opposed to best practice?

This document describes how to get the most out of a secure code review. What is important when managing an engagement with a client and how to keep your eye on the ball the see the “wood from the trees”.

Introduction: The only possible way of developing secure software and keeping it secure going into the future is to make security part of the design. When cars are designed safety is considered and now a big selling point for people buying a new car, “How safe is it?” would be a question a potential buyer may ask, also look at the advertising referring to the “Star” rating for safety a brand/model of car has. Unfortunately the software industry is not as evolved and hence people still buy software without paying any regard to the security aspect of the application.

This is what OWASP are trying to do, to bring security in web application development into the mainstream, to make is a selling point. 30% to 35% of Microsoft’s budget for “Longhorn” is earmarked for security, a sign of the times. http://news.bbc.co.uk/2/hi/business/4516269.stm

Every day more and more vulnerabilities are discovered in popular applications, which we all know and use and even use for private transactions over the web.

I’m writing this document not from a purest point of view. Not everything you may agree with but from experience it is rare that we can have the luxury of being a purest in the real world. Many forces in the business world do not see value in spending a proportion of the budget in security and factoring some security into the project timeline.

The usual one liners we hear in the wilderness:

''“We never get hacked (that I know of), we don’t need security”

“We never get hacked, we got a firewall”.

Question: “How much does security cost”? Answer: “How much shall no security cost”?

"Not to know is bad; not to wish to know is worse."'' - I love proverbs as you can see.

Code inspection is a fairly low-level approach to securing code but is very effective. It is in effect a look under the hood of an application (whitebox).

Buffer Overruns and Overflows
The Buffer

A Buffer is an amount of contiguous memory set aside for storing information. Example: A program has to remember certain things, like what your shopping cart contains or what data was inputted prior to the current operation this information is stored in memory in a buffer.

How to locate the potentially vulnerable code:

In locating potentially vulnerable code from a buffer overflow standpoint one should look for particular signatures such as:

Arrays:

int x[20];

int y[20][5];

int x[20][5][3];

Format Strings:

printf ,fprintf, sprintf, snprintf.

%x, %s, %n, %d, %u, %c, %f

 Over flows:

strcpy, strcat , sprintf , vsprintf

Vulnerable Patterns for buffer overflows:

‘Vanilla’ buffer overflow:

Example: A program might want to keep track of the days of the week (7). The programmer tells the computer to store a space for 7 numbers. This is an example of a buffer. But what happens if an attempt to add 8 numbers is performed?

Languages such as C and C++ do not perform bounds checking and therefore if the program is written in such a language the 8th piece of data would overwrite the program space of the next program in memory would result in data corruption.

This can cause the program to crash at a minimum or a carefully crafted overflow can cause malicious code to be executed, as the overflow payload is actual code.

void copyData(char *userId) {

char smallBuffer[10]; // size of 10

strcpy(smallBuffer, userId); }

int main(int argc, char *argv[]) {

char *userId = "01234567890"; // Payload of 11

copyData (userId); // this shall cause a buffer overload }

Buffer overflows are the result of stuffing more code into a buffer than it is meant to hold.

The Format String:

A format function is a function within the ANSI C specification. It can be used to tailor primitive C data types to human readable form. They are used in nearly all C programs, to output information, print error messages or process strings.

Some format parameters:

%x       hexadecimal (unsigned int)

%s       string ((const) (unsigned) char *)

%n       number of bytes written so far, (* int)

%d       decimal (int)

%u       unsigned decimal (unsigned int)

Example:

printf ("Hello: %s\n", a273150);

The %s in this case ensures that the parameter (a273150) is printer as a string.

Through supplying the format string to the format function we are able

to control the behaviour of it. So supplying input as a format string makes our application do things its not ment to! What exactly are we able to make the application do?

Crashing an application:

printf (“%s”, User_Input);

if we supply %x (hex unsigned int) as the input the printf function shall expext to find an integer relating to that format string but no argument exists. This can not be detected at compile time. At runtime this issue shall surface.

Walking the stack:

For every % in the argument the printf function finds it assumes that there is an associated value on the stack. In this way the function walks the stack downwards reading the corresponding values from the stack and printing them to user

Using format strings we can execute some invalid pointer access by using a format string such as:

printf ("%s%s%s%s%s%s%s%s%s%s%s%s");

Worse again is using the %n directive in printf. This directive takes an int* and writes the number of bytes so far to that location.

Where to look for this potential vulnerability. This issue is prevalent with the printf family of functions, printf,fprintf, sprintf, snprintf. Also syslog (writes system log information) and setproctitle(const char *fmt, ...); (which sets the string used to display process identifier information).

Integer overflows:


 * 1) include 

int main(void){

int val;

val = 0x7fffffff;         /* 2147483647*/

printf("val = %d (0x%x)\n", val, val);

printf("val + 1 = %d (0x%x)\n", val + 1, val + 1); /*Overflow the int*/

return 0;

}

The binary representation of 0x7fffffff is 1111111111111111111111111111111 this integer is initialised with the highest positive value a signed long integer can hold.

Here when we add 1 to the hex value of 0x7fffffff the value of the integer overflows and goes to a negative number (0x7fffffff + 1 = 80000000)

In decimal this is (-2147483648). Think of the problems this may cause!!

Compilers will not detect this and the application will not notice this issue.

We get these issues when we use signed integers in comparisons or in arithmetic and also comparing signed integers with unsigned integers

Example:

int myArray[100];

int fillArray(int v1, int v2){

if(v2 > sizeof(myArray) / sizeof(int)){

return -1; /* Too Big !! */

}

myArray [v2] = v1;

return 0;

}

Here if v2 is a massive negative number so the if condition shall pass. This condition checks to see if v2 is bigger than the array size.

The line myArray[v2] = v1 assigns the value v1 to a location out of the bounds of the array causing unexpected results.

Good Patterns & procedures to prevent buffer overflows:

Example:

void copyData(char *userId) {

char smallBuffer[10]; // size of 10

strncpy(smallBuffer, userId, 10); // only copy first 10 elements }

int main(int argc, char *argv[]) {

char *userId = "01234567890"; // Payload of 11

copyData (userId); // this shall cause a buffer overload }

The code above is not vulnerable to buffer overflow as the copy functionality uses a specified length, 10.

C library functions such as strcpy, strcat , sprintf and vsprintf  operate on null terminated strings and perform no bounds checking. gets is another function that reads input (into a buffer) from stdin until a terminating newline or EOF (End of File) is found. The scanf family of functions also may result in buffer overflows.

Using strncpy, strncat, snprintf, and fgets all mitigate this problem by specifying the expected input.

Always check the bounds of an array before writing it to a buffer.

.NET & Java

C# or C++ code in the .NET framework can be immune to buffer overflows if the code is managed. Managed code is code executed by a .NET virtual machine, such as Microsoft's. Before the code is run, the Intermediate Language is compiled into native code. The managed execution environments own runtime-aware complier performs the compilation; therefore the managed execution environment can guarantee what the code is going to do. The Java development language also does not suffer from buffer overflows; as long as native methods or system calls are not invoked buffer overflows are not an issue.

TO DO – Unsafe methods which cause arithmetic overflows.

Data Validation
Data Validation

One key area in web application security is the validation of data inputted from an external source. Many application exploits a derived from weak input validation on behalf of the application. Weak data validation gives the attacked the opportunity to make the application perform some functionality which it is not meant to do.

Canoncalization of input.

Input can be encoded to a format that can still be interpreted correctly by the application but may not be an obvious avenue of attack.

The encoding of ASCI to Unicode is another method of bypassing input validation. Applications rarely test for Unicode exploits and hence provides the attacker a route of attack.

The issue to remember here is that the application is safe if Unicode representation or other malformed representation is input. The application responds correctly and recognises all possible representations of invalid characters.

Example:

The ASCII: (If we simply block “<” and “>” characters the other representations below shall pass data validation and execute).

URL encoded: %3C%73%63%72%69%70%74%3E Unicode Encoded: &#60&#115&#99&#114&#105&#112&#116&#62

The OWASP Guide 2.x delves much more into this subject. Data validation strategy

A general rule is to accept only “Known Good” characters, i.e. the characters that are to be expected. If this cannot be done the next strongest strategy is “Known bad”, where we reject all known bad characters. The issue with this is that today’s known bad list may expand tomorrow as new technologies are added to the enterprise infrastructure.

Data Validation Strategy There are a number of models to think about when designing a data validation strategy.

1.	Exact Match (Constrain) 2.	Known Good (Accept) 3.	Reject Known bad (Reject) 4.     Encode Known bad (Sanitise)

In addition there must be a check for maximum length of any input received from an external source, such as a downstream service/computer or a user at a web browser. Rejected Data must not be persisted to the data store unless it is sanitised. This is a common mistake to log erroneous data but that may be what the attacker wishes your application to do.

·	Exact Match: (preferred method) Only accept values from a finite list of known values. E.g.: A Radio button component on a Web page has 3 settings (A, B, C). Only one of those three settings must be accepted (A or B or C). Any other value must be rejected.

·	Known Good: If we do not have a finite list of all the possible values that can be entered into the system we uses known good approach. E.g.: an email address, we know it shall contain one and only one @. It may also have one or more full stops “.”. The rest of the information can be anything from [a-z] or [A-Z] or [0-9] and some other characters such as “_ “or “–“, so we let these ranges in and define a maximum length for the address.

·	Reject Known bad: We have a list of known bad values we do not wish to be entered into the system. This occurs on free form text areas and areas where a user may write a note. The weakness of this model is that today known bad may not be sufficient for tomorrow.

·	Encode Known Bad: This is the weakest approach. This approach accepts all input but HTML encodes any characters within a certain character range. HTML encoding is done so if the input needs to be redisplayed the browser shall not interpret the text as script, but the text looks the same as what the user originally typed.

HTML-encoding and URL-encoding user input when writing back to the client. In this case, the assumption is that no input is treated as HTML and all output is written back in a protected form. This is sanitisation in action.

Good Patterns for Data validation

A good example of a pattern for data validation to prevent OS injection in PHP applications would be as follows:

$string = preg_replace("/[^a-zA-Z0-9]/", "", $string);

This code above would replace any non alphanumeric characters with “”. preg_grep could also be used for a True or False result. This would enable us to let “only known good” characters into the application.

Using regular expressions is a common method of restricting input character types. A common mistake in the development of regular expressions is not escaping characters, which are interpreted as control characters, or not validating all avenues of input.

Examples of regular expression are as follows:

http://www.regxlib.com/CheatSheet.htm

^[a-zA-Z]$ 	Alpha characters only, a to z and A to Z (RegEx is case sensitive). ^[0-9]$ 	Numeric only (0 to 9). [abcde] 	Matches any single character specified in set [^abcde] 	Matches any single character not specified in set

Framework Example(Struts 1.2)

In the J2EE world the struts framework (1.1) contains a utility called the commons validator. This enables us to do two things.

Enables us to have a central area for data validation. Provides us with a data validation framework.

1.	What to look for when examining struts is as follows:

2.	The struts-config.xml file must contain the following:







This tells the framework to load the validator plug-in. It also loads the property files defined by the comma-separated list. By default a developer would add regular expressions for the defined fields in the validation.xml file.

Next we look at the form beans for the application. In struts, form beans are on the server side and encapsulate the information sent to the application via a HTTP form. We can have concrete form beans (built in code by developers) or dynamic form beans. Here is a concrete bean below:

package com.pcs.necronomicon import org.apache.struts.validator.ValidatorForm; public class LogonForm extends ValidatorForm {

private String username; private String password; public String getUsername {

return username;

}

public void setUsername(String username) {

this.username = username;

}

public String getPassword {

return password;

}

public void setPassword(String password) {

this.password = password;

}

} Note the LoginForm extends the ValidatorForm, this is a must as the parent class (ValidatorForm) has a validate method which is called automatically and calls the rules defined in validation.xml

Now to be assured that this form bean is being called we look at the struts-config.xml file: It should have something like the following:

  

Next we look at the validation.xml file. It should contain something similar to the following:

    

Note the same name in the validation.xml, the struts-config.xml, this is an important relationship and is case sensitive.

The field “username” is also case sensitive and refers to the String username in the LoginForm class. The “depends” directive dictates that the parameter is required. If this is blank the error defined in Application.properties. This configuration file contains error messages among other things. It is also a good place to look for information leakage issues:

errors.required={0} is required. errors.minlength={0} cannot be less than {1} characters. errors.maxlength={0} cannot be greater than {2} characters. errors.invalid={0} is invalid. errors.byte={0} must be a byte. errors.short={0} must be a short. errors.integer={0} must be an integer. errors.long={0} must be a long.0. errors.float={0} must be a float. errors.double={0} must be a double. errors.date={0} is not a date. errors.range={0} is not in the range {1} through {2}. errors.creditcard={0} is not a valid credit card number. errors.email={0} is an invalid e-mail address. prompt.username = User Name is required.
 * 1) Error messages for Validator framework validations

The error defined by arg0, prompt.username is displayed as an alert box by the struts framework to the user.

The developer would need to take this a step further by validating the input via regular expression:   mask <var-value>^[0-9a-zA-Z]*$</var-value> </form-validation>

Here we have added the Mask directive, this specifies a variable. and a regular expression. Any input into the username field which has anything other than A to Z, a to z or 0 to 9 shall cause an error to be thrown. The most common issue with this type of development is either the developer forgetting to validate all fields or a complete form. The other thing to look for is incorrect regular expressions, so learn those RegEx’s kids!!!

We also need to check if the jsp pages have been linked up to the validation.xml finctionaltiy. This is done by <html:javascript> custom tag being included in the JSP as follows:

<html:javascript formName="logonForm" dynamicJavascript="true" staticJavascript="true" />

Framework example(.NET):

The ASP .NET framework contains a validator framework, which has made input validation easier and less error prone than in the past. The validation solution for .NET also has client and server side functionalty akin to Struts (J2EE).

What is a validator? According to the Miscosoft (MSDN) definition it is as follows:

"A validator is a control that checks one input control for a specific type of error condition and displays a description of that problem."

The main point to take out of this from a code review perspective is that one validator does one type of function. If we need to do a number of different checks on our input we need to use more than one validator.

The .NET solution contains a number of controls out of the box:

RequiredFieldValidator – Makes the associated input control a required field.

CompareValidator – Compares the value entered by the user into an input control with the value entered into another input control or a constant value.

RangeValidator – Checks if the value of an input control is within a defined range of values.

RegularExpressionValidator – Checks user input against a regular expression.

The following is an example web page (.aspx) containing validation:

Validate me baby! <asp:ValidationSummary runat=server HeaderText="There were errors on the page:" /> Please enter your User Id <asp:RequiredFieldValidator runat=server ControlToValidate=Name ErrorMessage="User ID is required."> </asp:RequiredFieldValidator> User ID: <input type=text runat=server id=Name> <asp:RegularExpressionValidator runat=server display=dynamic controltovalidate="Name" errormessage="ID must be 6-8 letters." validationexpression="[a-zA-Z0-9]{6,8}" /> <input type=submit runat=server id=SubmitMe value=Submit>

Remember to check to regular expressions so they are sufficient to protect the application. The “runat” directive means this code is executed at the server prior to being sent to client. When this is displayed to a users browser the code is simply HTML.

Length Checking: Another issue to consider is input length validation. If the input is limited by length this reduces the size of the script that can be injected into the web app. Many web applications use operating system features and external programs to perform their functions. When a web application passes information from an HTTP request through as part of an xternal request, it must be carefully data validated for content and min/max length. Without data validation the attacker can inject Meta characters, malicious commands, or command modifiers, masquerading, as legitimate information and the web application will blindly pass these on to the external system for execution.

Checking for minimum and maximum length is of paramount importance, even if the code base is not vulnerable to buffer overflow attacks.

If a logging mechanism is employed to log all data used in a particular transaction we need to ensure that the payload received is not so big that it may affect the logging mechanism.

If the log file is sent a very large payload it may crash or if it is sent a very large payload repeatedly the hard disk of the app server may fill causing a denial of service. This type of attack can be used to recycle to log file, hence removing the audit trail.

If string parsing is performed on the payload received by the application and an extremely large string is sent repeatedly to the application the CPU cycles used by the application to parse the payload may cause service degradation or even denial of service.

'Never Rely on Client-Side Data Validation' Client-side validation can always be bypassed. Server-side code should perform its own validation. What if an attacker bypasses your client, or shuts off your client-side script routines, for example, by disabling JavaScript?

Use client-side validation to help reduce the number of round trips to the server but do not rely on it for security.

Remember: Data validation must be always done on the server side. A code review focuses on server side code. Any client side security code is not and cannot be considered security.

Data validation of parameter names When data is passed to a method of a web application via HTTP the payload is passed in a “key-value” pair such as

UserId =3o1nk395y password=letMeIn123

Previously we talked about input validation of the payload (parameter value) being passed to the application. But we also may need to check that the parameter name (UserId,password from above) have not been tampered with. Invalid parameter names may cause the application to crash or act in an unexpected way. The best approach is “Exact Match” as mentioned previously.

Web services data validation

The recommended input validation technique for web services is to use a schema. A schema is a “map” of all the allowable values that each parameter can take for a given web service method. When a SOAP message is received by the web services handler the schema pertaining to the method being called is “run over” the message to validate the content of the soap message. There are two types of web service communication methods; XML-IN/XML-OUT and REST (Representational State Transfer). XML-IN/XML-OUT means that the request is in the form of a SOAP message and the reply is also SOAP. REST web services accept a URI request (Non XML) but return a XML reply. REST only supports a point-to-point solution wherein SOAP chain of communication may have multiple nodes prior to the final destination of the request. Validating REST web services input it the same as validating a GET request. Validating an XML request is best done with a schema.

<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://server.test.com" targetNamespace="http://server.test.com" elementFormDefault="qualified" attributeFormDefault="unqualified">

<xsd:complexType name="AddressIn">

<xsd:sequence>

<xsd:element name="addressLine1" type="HundredANumeric" nillable="true"/>

<xsd:element name="addressLine2" type="HundredANumeric" nillable="true"/>

<xsd:element name="county" type="TenANumeric" nillable="false"/>

<xsd:element name="town" type="TenANumeric" nillable="true"/>

<xsd:element name="userId" type="TenANumeric" nillable="false"/>

</xsd:sequence>

</xsd:complexType>

<xsd:simpleType name="HundredANumeric">

<xsd:restriction base="xsd:string">

<xsd:minLength value="1"/>

<xsd:maxLength value="100"/>

<xsd:pattern value="[a-zA-Z0-9]"/>

</xsd:restriction>

</xsd:simpleType>

<xsd:simpleType name="TenANumeric">

<xsd:restriction base="xsd:string">

<xsd:minLength value="1"/>

<xsd:maxLength value="10"/>

<xsd:pattern value="[a-zA-Z0-9]"/>

</xsd:restriction>

</xsd:simpleType>

</xsd:schema>

Here we have a schema for an object called AddressIn. Each of the elements have restrictions applied to them and the restrictions (in red) define what valid characters can be inputted into each of the elements.

What we need to look for is that each of the elements have a restriction applied to the as opposed to the simple type definition such as xsd:string.

This schema also has the <xsd:sequence> tag applied to enforce the sequence of the data that is to be received.