# Difference between revisions of "How to test session identifier strength with WebScarab"

(→Collecting session identifiers) |
RoganDawes (talk | contribs) (Added a worked example showing how the calculations are done) |
||

Line 23: | Line 23: | ||

As mentioned earlier, WebScarab uses a per-position base-conversion algorithm to convert a string into a number. What this really means is that the string is converted to a number using the same approach that one uses to convert a number of one base (e.g. hex - base 16) to another (e.g. decimal - base 10). The major difference is that the base can change for each position/index, according to what characters have actually been observed in that position throughout the sampled series. This means that if you have a constant character in the middle of your series, the base ends up being "1", the only possible value in a base-1 number system is 0, and so the constant character plays no part in actually calculating the numerical value of the total. | As mentioned earlier, WebScarab uses a per-position base-conversion algorithm to convert a string into a number. What this really means is that the string is converted to a number using the same approach that one uses to convert a number of one base (e.g. hex - base 16) to another (e.g. decimal - base 10). The major difference is that the base can change for each position/index, according to what characters have actually been observed in that position throughout the sampled series. This means that if you have a constant character in the middle of your series, the base ends up being "1", the only possible value in a base-1 number system is 0, and so the constant character plays no part in actually calculating the numerical value of the total. | ||

+ | |||

+ | Here is a worked example, on a small scale. | ||

+ | |||

+ | Assuming we have the following session ids: | ||

+ | |||

+ | AAAA | ||

+ | AAAC | ||

+ | ABAB | ||

+ | ABAD | ||

+ | |||

+ | Starting from the left-most column (MSB), we have the following observed character sets: | ||

+ | |||

+ | 1: "A" | ||

+ | 2: "A", "B" | ||

+ | 3: "A" | ||

+ | 4: "A", "B", "C", "D" | ||

+ | |||

+ | So, our bases are, in order (1,2,1,4). | ||

+ | |||

+ | Let's calculate the value of each id. In order to translate each character to a number, we use its zero-based position in the sorted character set: | ||

+ | |||

+ | AAAA = 0 * (2*1*4) + 0 * (1*4) + 0 * (4) + 0 = 0 | ||

+ | AAAC = 0 * (2*1*4) + 0 * (1*4) + 0 * (4) + 2 = 2 | ||

+ | ABAB = 0 * (2*1*4) + 1 * (1*4) + 0 * (4) + 1 = 5 | ||

+ | ABAD = 0 * (2*1*4) + 1 * (1*4) + 0 * (4) + 3 = 7 | ||

== Looking at the graph == | == Looking at the graph == |

## Revision as of 12:13, 31 October 2006

## Objective

To collect and examine a reasonably large sample of session identifiers, to determine if they could be vulnerable to prediction, or brute force attacks.

## Approach

Identify a request that generates a suitable session identifier. For example, if the identifier is supplied in a cookie, look for responses that include Set-Cookie headers, then use the request repeatedly to obtain more session identifiers. We will then perform some analysis on the resulting series of identifiers. The WebScarab SessionID analysis plugin currently converts the session identifier into a large integer, using a per-position base-conversion algorithm. I'll explain more about the algorithm later, once we have collected some results.

## Collecting session identifiers

It is possible to collect session identifiers from both Set-Cookie headers, as well as from within the body of the response. WebScarab will collect all identifiers from all cookies if the radio button is set to "Cookies". It is not necessary to provide a name for the session identifier, as WebScarab will use the site name, path and cookie name to construct a unique identifier. If you choose to extract session identifiers from the body of the response, you have to give it a unique name, and provide a regular expression that defines which part of the response body is considered to be the identifier. This is typically done by using ".*" to indicate all characters leading up to some unique surrounding text, followed by that unique text, then a pattern surrounded by a regex group (e.g. "(....)" would take 4 characters), finally followed by ".*" again to indicate all characters to the end of the body text.

For a more concrete example, let's suppose that the identifier is in a URL query parameter in the body text, and the url parameter is called "id". An example might look like: http://www.example.com/loggedin.aspx?id=<10 alphanumeric characters>

A suitable regex might be: .*loggedin.aspx\?id=(.{10}).*

In order to check that your regular expression is actually correctly matching the text in the response, use the "Test" button to show what would be extracted. The results of the test are not stored for later use.

Once you are satisfied with your configuration, simply enter the number of samples desired, and press "Go". If you decide to interrupt the collection process, you can do so by requesting 0 samples, and pressing "Go" again.

## Analysing the results

As mentioned earlier, WebScarab uses a per-position base-conversion algorithm to convert a string into a number. What this really means is that the string is converted to a number using the same approach that one uses to convert a number of one base (e.g. hex - base 16) to another (e.g. decimal - base 10). The major difference is that the base can change for each position/index, according to what characters have actually been observed in that position throughout the sampled series. This means that if you have a constant character in the middle of your series, the base ends up being "1", the only possible value in a base-1 number system is 0, and so the constant character plays no part in actually calculating the numerical value of the total.

Here is a worked example, on a small scale.

Assuming we have the following session ids:

AAAA AAAC ABAB ABAD

Starting from the left-most column (MSB), we have the following observed character sets:

1: "A" 2: "A", "B" 3: "A" 4: "A", "B", "C", "D"

So, our bases are, in order (1,2,1,4).

Let's calculate the value of each id. In order to translate each character to a number, we use its zero-based position in the sorted character set:

AAAA = 0 * (2*1*4) + 0 * (1*4) + 0 * (4) + 0 = 0 AAAC = 0 * (2*1*4) + 0 * (1*4) + 0 * (4) + 2 = 2 ABAB = 0 * (2*1*4) + 1 * (1*4) + 0 * (4) + 1 = 5 ABAD = 0 * (2*1*4) + 1 * (1*4) + 0 * (4) + 3 = 7

## Looking at the graph

The calculated values are then plotted on a graph against time. The idea is that the human eye is very good at visually identifying patterns, which may not be obvious from a list of numbers. The most likely patterns that you will see are lines or bands (possibly interrupted/broken), or else points scattered all over the graph. The first indicates predictability, while the second suggests randomness.

**NOTE!!!** Predictability and randomness are relative terms. If the algorithm appears to be "predictable", but the key space that you'd have to check is greater than about 100000 items, it is likely to be infeasible to actually find a session belonging to someone else during that session's lifetime. Obviously, this depends on your own CPU power, network bandwidth, the target's CPU power and network bandwidth, the typical lifetime of a session, and a bunch of other factors. Please look at the scale of the numbers before deciding that an identifier is predictable.

One very important thing to note about the conversion algorithm is that it works from right (Least Significant Bit) to left (Most Significant Bit), much as one would expect from a numerical conversion. What this means in practice is that if you have a session identifier that has some sequential data at the left, and significant random data to the right, the sequential data will appear to dominate the values, and will result in a straight line graph. Again, check the scale of the numbers before deciding that an identifier is predictable.