PDF Attack Filter for Apache mod rewrite

Revision as of 06:13, 17 January 2007 by Jonz (talk | contribs)

Jump to: navigation, search


This is a filter to block XSS attacks on PDF files served by Apache with mod_rewrite installed. I actually developed the algorithm in a totally independent manner but later found it to be almost identical to an algorithm developed by Amit Klein.

So far I've seen two real-world implementations of this algorithm. One for Java EE right here in OWASP and a second one developed by F5 iRules.

Adobe has published their official server-side workarounds. The Adobe solutions work by changing either the Content-Type or Content-Disposition headers in order to force the pdf to be downloaded avoiding the Adobe plug-in. I think the "Content-Type" change is less desirable as I've seen it working improperly in some cases. The downside of either implementation is that for long-estabilished commercial websites, this changes the effective functionality of the site in that pdf's (on PC's configured with Adobe plugin), no longer open up in the browser.

Ideally, in order not to lose the functionality, we can clear the "anchor/fragment" off of the original URL by forcing a redirect to an URL containing a "hard to reproduce" but verifiable token in the query string (or even in the location).

The details of the attack are discussed elsewhere. This filter implements a simple algorithm suggested by Amit Klein. We've placed this software in the public domain to make it easy for anyone to use for any purpose.


  1. OWASP Java EE entry for structure and skeleton contents.
  2. Algorithm developed by Amit Klein.
  3. Base64 code.
  4. OpenSSL encryption code. Vinayak Hegde.


The Approach is almost the same as the approach described for the Java EE implementation. The main difference is that in the case that a request arrives with a query string, but one which doesn't match our generated token, instead of redirecting to save the file, we redirect with a new, correct token.


Apache mod_rewrite rules

	RewriteEngine   On
	RewriteLog "logs/rewrite.log"
	RewriteLogLevel 10

	RewriteMap tokenize prg:/home/jon/tokenize

	RewriteCond %{REQUEST_URI} .pdf
	RewriteCond %{QUERY_STRING} ^$
	RewriteRule ^(.*)$ $1?${tokenize:%{REMOTE_ADDR}} [R,L]

	RewriteCond %{REQUEST_URI} .pdf
	RewriteRule ^(.*)$ $1?${tokenize:%{REMOTE_ADDR}%{QUERY_STRING}}

	RewriteCond %{REQUEST_URI} .pdf
	RewriteCond %{QUERY_STRING} !^$
	RewriteRule ^(.*)$ $1?${tokenize:%{REMOTE_ADDR}} [R,L]

1) The first set says that all requests which have a ".pdf" and an empty query string will be processed by a rewrite rule which will force a redirect to the same pdf but with a query string which will contain our token, created by the external program "tokenize" (see below). In this case we pass the client ip address which will be used by tokenize along with a time based parameter in order to create a token, for example:


Tokenize creates a string, encrypts it, turns it into base64 and returns the resultant string. For example:


The redirect returned will be something like:


2) The second rule, in the case that the client arrives with a query string already set, calls tokenize to remap the querystring, passing the remote address and the query string. N.B. if it is a properly formed token, tokenize has put in a delimiter which will distinguish it from the rest of the stuff (i.e. remote_addr) we pass it before. Important to pass the remote address, as in the first rule, since we have to recreate the same exact string. For example, in the above case we might pass tokenize

Tokenize will use the string up to "TOKENDELIMITER" plus maybe some time based stuff in order to create a token (in the above example, "". It will then match the newly created token with any token contained in the string (in the above example "Moa43/zBdWp+E474FkoOkgJ2ZKNds6N"). There are two possibilities;

  1. No token or the token doesn't match => returns a new token
  2. token matchs, returns an empty string

3) At this point we get to the third rule. If the query string is not empty, we basically redo rule 1, i.e. another redirect. We get here in the case that something was specified in the query_string and it was not a correct token, perhaps even a token which is no longer valid. If we don't have this third catchall, putting in any query string will allow us to bypass the entire mechanism.

Now on to "tokenize".. This is a mod_rewrite external program maptype. It is run on initialization of the rewrite engine and loops on stdin where it receives text to transform delimted by newlines. Output is on stdout, terminated by newline. Our version of tokenize interprets the input text as two pieces, some client-based info, such as client-ip or other, and a possibile token to check, for example

Tokenize will generate a token using all of the text before "TOKENDELIMITER" + any extra info such as current time (maybe taking out minutes). If the token generated matches the stuff after "TOKENDELIMITER" it will write a blank line on stdout, otherwise it will write out the new token on stdout, e.g.


"C" source code - mod_rewrite prg

This code has been only minimally tested. Please help us verify the approach and the implementation used here.

Note: In order to use, you must fill in key and initial value, and of course these should be kept secret. Also you may want to modify how time/date is used as well as any other system variables you might want to add in.

 *  This software is in the public domain with no warranty.
 * @author     Jon Zaid 
 * @created    January 14, 2007

#include <time.h>
#include <openssl/blowfish.h>
#include <openssl/evp.h>
#include <openssl/blowfish.h>
#include <openssl/evp.h>
#include <fcntl.h>
#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>

#define IP_SIZE 1024
#define OP_SIZE 1032
#include <stdio.h>

#include <string.h>
#include <errno.h>
#include <stdlib.h>


encode a buffer using base64 - simple and slow algorithm. null terminates
the result.
Code taken from http://www.samba.org/ftp/unpacked/junkcode/base64.c
static void base64_encode(char *buf, int len, char *out)
	char *b64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
	int bit_offset, byte_offset, idx, i;
	unsigned char *d = (unsigned char *)buf;
	int bytes = (len*8 + 5)/6;

	memset(out, 0, bytes+1);

	for (i=0;i<bytes;i++) {
		byte_offset = (i*6)/8;
		bit_offset = (i*6)%8;
		if (bit_offset < 3) {
			idx = (d[byte_offset] >> (2-bit_offset)) & 0x3F;
		} else {
			idx = (d[byte_offset] << (bit_offset-2)) & 0x3F;
			if (byte_offset+1 < len) {
				idx |= (d[byte_offset+1] >> (8-(bit_offset-2)));
		out[i] = b64[idx];

unsigned char key[16] = { xx, xx, xx, xx, xx, xx, xx, xx, xx, xx, xx, xx, xx, xx, xx, xx};
unsigned char iv[8] = {xx, xx, xx, xx, xx, xx, xx, xx};

/* code taken from 

encrypt_str (char *in, char *out)
	int olen, tlen, n;
	char outbuf[IP_SIZE];
	EVP_CIPHER_CTX_init (&ctx);
	EVP_EncryptInit (&ctx, EVP_bf_cbc (), key, iv);

	if (EVP_EncryptUpdate (&ctx, outbuf, &olen, in, strlen(in)) != 1)
	    printf ("error in encrypt update\n");
	    return 0;

	if (EVP_EncryptFinal (&ctx, outbuf + olen, &tlen) != 1)
	    printf ("error in encrypt final\n");
	    return 0;
	olen += tlen;
	base64_encode(outbuf, olen, out);

	EVP_CIPHER_CTX_cleanup (&ctx);
	return 1;

#define MAX_TOKEN_LEN	1024
/* in contains the request buffer which is an arbitrary string possibly
   followed by a delimited token (delimted "TOKEN")
   we use the time as well as the string up to TOKEN to delimit it
gen_token(char *in, char *token64, int token64_len)
time_t now;
char token[MAX_TOKEN_LEN];
int len;
char *pToken;

	/* copy first part of buffer to token */
	if (pToken = strstr(in, TOKEN_PATTERN))
		len = pToken - in;
		len = strlen(in);
	if (len >= MAX_TOKEN_LEN)
		len = MAX_TOKEN_LEN-1;
	strncpy(token, in, len);
	token[len] = '\0';

	/* now add time to it */
	now = time(NULL);
	now =  now - (now % TOKEN_EXPIRY_TIME);
	snprintf(&token[len], sizeof(token)-len-1, "%d",  now);
	encrypt_str(token, token64);

mod_rewrite prg handler
Requests arrive on stdin delimter by \n
Single line reply for every request

	generate a token based on input buffer contents and any other sysinfo
	we want;
	if request buffer already contains a token compare it with the 
	generated one;
	is (generated same as passed token)
		return emptyline;
		return generated token;


    while (1)
	{ /* for all requests */
	char reqBuffer[4096];
	char c;
	int len;
	int charsRead;
	char token64[MAX_TOKEN_LEN*2];
	char *pToken = reqBuffer;

	len = 0;
	while (    ((charsRead = read(0, &c, 1)) == 1) 
		&& (len < sizeof(reqBuffer)-2)) 
	    if (c == '\n') /* end of single request? */
	    reqBuffer[len++] = c;
	if (charsRead < 0) break; /* exiting from prg/apache? */
	reqBuffer[len] = '\0';

	/* based on contents of request, generate a string token. Must be
	   zero-delimted and ASCII printable chars, e.g. base64 */
	gen_token(reqBuffer, token64, sizeof(token64));

	/* find pointer to starting of actual encrypted token in 
	   request buffer, e.g. 
	   we will point to "encryptedtoken".
	if (pToken = strstr(reqBuffer, TOKEN_PATTERN))
		pToken += strlen(TOKEN_PATTERN);

	/* compare newly generated token with one that is passed
	   and if not same output the newly generated token, else
	   don't output anything */
	if (!pToken || strcmp(token64, pToken))
		printf("%s %s\n", token64, pToken?pToken:"NULL");
		write(1, TOKEN_PATTERN, strlen(TOKEN_PATTERN));
		write(1, token64, strlen(token64));
	write(1, "\n", 1);
	} /* for all requests */

This article is a stub. You can help OWASP by expanding it or discussing it on its Talk page.