Difference between revisions of "Protect FileUpload Against Malicious File"

From OWASP
Jump to: navigation, search
m (Add primary editor)
m (Add the article to the cheatsheet project as expected when it was created)
Line 1: Line 1:
Last revision (mm/dd/yy): '''05/13/2016'''
+
__NOTOC__
 +
<div style="width:100%;height:160px;border:0,margin:0;overflow: hidden;">[[File:Cheatsheets-header.jpg|link=]]</div>
  
== Introduction ==
+
{| style="padding: 0;margin:0;margin-top:10px;text-align:left;" |-
 +
| valign="top" style="border-right: 1px dotted gray;padding-right:25px;" |
 +
Last revision (mm/dd/yy): '''{{REVISIONMONTH}}/{{REVISIONDAY}}/{{REVISIONYEAR}}'''
 +
= Introduction =
 +
__TOC__{{TOC hidden}}
  
 
This article propose a way to protect a file upload feature against submission of file containing malicious code.
 
This article propose a way to protect a file upload feature against submission of file containing malicious code.
Line 461: Line 466:
 
[[User:Dominique_RIGHETTO|Dominique Righetto]] - dominique.righetto@owasp.org
 
[[User:Dominique_RIGHETTO|Dominique Righetto]] - dominique.righetto@owasp.org
  
[[Category:Code Snippet]]
+
= Other Cheatsheets =
[[Category:Java]]
+
 
 +
{{Cheatsheet_Navigation_Body}}
 +
 
 +
|}
 +
 
 +
[[Category:Cheatsheets]]

Revision as of 15:37, 3 November 2017

Cheatsheets-header.jpg

Last revision (mm/dd/yy): 11/3/2017

Introduction

This article propose a way to protect a file upload feature against submission of file containing malicious code.

Context

Into web applications, when we expect upload of working documents from users, we can expose the application to submission of documents that we can categorize as malicious.

We use the term "malicious" here to refer to documents that embed malicious code that will be executed when another user (admin, back office operator...) will open the document with the associated application reader.

Usually, when an application expect his user to upload a document, the application expect to receive a document for which the intended use will be for reading/printing/archiving. The document should not alter is content at opening time and should be in a final rendered state.

The most common file types used to transmit malicious code into file upload feature are the following:

  • Microsoft Office document: Word/Excel using VBA Macro and OLE package.
  • Adobe PDF document: Insert malicious code as attachment.
  • Images: Malicious code embedded into the file or use of binary file with image file extension.


Approaches

Based on this context, the goals here are:

  • For Word/Excel/Pdf documents: Detect when a document contains "code"/OLE package and then block upload process.
  • For Images document: Sanitize incoming image using re-writing approach and then disable/remove any "code" present (this approach also handle case in which the file sent is not an image).

Remarks:

  • It's technically possible to perform sanitizing on Word/Excel/PDF documents but we have choosen here the option to block them in order to avoid the risk of missing any evasion technics and then let pass one evil document. The following site show how many way exists to embed Macro into a Microsoft Office documents.
  • The other reason why we have choosen the blocking way is that for Word/Excel, changing document format (for example by saving any document to DOCX/XSLX formats in order to be sure that no Macro can be executed) can have impacts or cause issues on document structure/rendering depending on the API used.

Cases

Common codes

The following codes are shared by the code snippets proposed into the rest of this article.

Interfaces:

DocumentDetector

import java.io.File;

/**
 * Interface to define detection methods.
 *
 */
public interface DocumentDetector {
	/**
	 * Method to verify if the specified file contains a document that:<br>
	 * <ul>
	 * <li>Do not contains potential malicious content</li>
	 * <li>Is part of the supported accepted format</li>
	 * </ul>
	 * 
	 * @param f File to validate
	 * 
	 * @return TRUE only if the file fill the 2 rules above
	 */
	boolean isSafe(File f);
}

DocumentSanitizer

import java.io.File;

/**
 * Interface to define sanitize methods.
 *
 */
public interface DocumentSanitizer {
	/**
	 * Method to try to (sanitize) disable any code contained into the specified file by using re-writing approach.
	 * 
	 * @param f File to made safe
	 * 
	 * @return TRUE only if the specified file has been successfully made safe.
	 */
	boolean madeSafe(File f);
}

Case n°1: Word / Excel

The reason why Aspose API have been used here are the following:

  • There many way to embed Macro into a Microsoft Office document and, instead of manually support all the way that exists on the wild (they evolve every days), we prefer to use features from a company that perform R&D on these formats, precisely DOC and XLS native format that are proprietary.
  • The open source API POI for DOC native format is limited.
  • The open source API JEXCELAPI for XLS native format is not often maintained (last publishing date from 2009).
  • Trial version of the APIs can be used for detection only and it's seems that there not license limitation about this type of specific usage.

Detector for Word document:

import java.io.File;
import java.util.Arrays;
import java.util.List;
import java.util.Locale;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.aspose.words.Document;
import com.aspose.words.FileFormatInfo;
import com.aspose.words.FileFormatUtil;
import com.aspose.words.NodeCollection;
import com.aspose.words.NodeType;
import com.aspose.words.Shape;

/**
 * Implementation of the detector for Microsoft Word document.
 * 
 *
 */
public class WordDocumentDetectorImpl implements DocumentDetector {

	/** LOGGER */
	private static final Logger LOG = LoggerFactory.getLogger(WordDocumentDetectorImpl.class);

	/**
	 * List of allowed Word format (WML = Word ML (Word 2003 XML)).<br>
	 * Allow also DOCM because it can exists without macro inside.<br>
	 * Allow also DOT/DOTM because both can exists without macro inside.<br>
	 * We reject MHTML file because:<br>
	 * <ul>
	 * <li>API cannot detect macro into this format</li>
	 * <li>Is not normal to use this format to represent a Word file (there plenty of others supported format)</li>
	 * </ul>
	 */
	private static final List<String> ALLOWED_FORMAT = Arrays.asList(new String[] { "doc", "docx", "docm", "wml", "dot", "dotm" });

	/**
	 * {@inheritDoc}
	 *
	 * @see eu.righettod.poc.detector.DocumentDetector#isSafe(java.io.File)
	 */
	@SuppressWarnings("rawtypes")
	@Override
	public boolean isSafe(File f) {
		boolean safeState = false;
		try {
			if ((f != null) && f.exists() && f.canRead()) {
				// Perform a first check on Word document format
				FileFormatInfo formatInfo = FileFormatUtil.detectFileFormat(f.getAbsolutePath());
				String formatExtension = FileFormatUtil.loadFormatToExtension(formatInfo.getLoadFormat());
				if ((formatExtension != null) && ALLOWED_FORMAT.contains(formatExtension.toLowerCase(Locale.US).replaceAll("\\.", ""))) {
					// Load the file into the Word document parser
					Document document = new Document(f.getAbsolutePath());
					// Get safe state from Macro presence
					safeState = !document.hasMacros();
					// If document is safe then we pass to OLE objects analysis
					if (safeState) {
						// Get all shapes of the document
						NodeCollection shapes = document.getChildNodes(NodeType.SHAPE, true);
						Shape shape = null;
						// Search OLE objects in all shapes
						int totalOLEObjectCount = 0;
						for (int i = 0; i < shapes.getCount(); i++) {
							shape = (Shape) shapes.get(i);
							// Check if the current shape has OLE object
							if (shape.getOleFormat() != null) {
								totalOLEObjectCount++;
							}
						}
						// Update safe status flag according to number of OLE object found
						if (totalOLEObjectCount != 0) {
							safeState = false;
						}

					}
				}
			}
		}
		catch (Exception e) {
			safeState = false;
			LOG.warn("Error during Word file analysis !", e);
		}
		return safeState;
	}

}

Detector for Excel document:

import java.io.File;
import java.util.Arrays;
import java.util.List;
import java.util.Locale;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.aspose.words.Document;
import com.aspose.words.FileFormatInfo;
import com.aspose.words.FileFormatUtil;
import com.aspose.words.NodeCollection;
import com.aspose.words.NodeType;
import com.aspose.words.Shape;

/**
 * Implementation of the detector for Microsoft Word document.
 * 
 *
 */
public class WordDocumentDetectorImpl implements DocumentDetector {

	/** LOGGER */
	private static final Logger LOG = LoggerFactory.getLogger(WordDocumentDetectorImpl.class);

	/**
	 * List of allowed Word format (WML = Word ML (Word 2003 XML)).<br>
	 * Allow also DOCM because it can exists without macro inside.<br>
	 * Allow also DOT/DOTM because both can exists without macro inside.<br>
	 * We reject MHTML file because:<br>
	 * <ul>
	 * <li>API cannot detect macro into this format</li>
	 * <li>Is not normal to use this format to represent a Word file (there plenty of others supported format)</li>
	 * </ul>
	 */
	private static final List<String> ALLOWED_FORMAT = Arrays.asList(new String[] { "doc", "docx", "docm", "wml", "dot", "dotm" });

	/**
	 * {@inheritDoc}
	 *
	 * @see eu.righettod.poc.detector.DocumentDetector#isSafe(java.io.File)
	 */
	@SuppressWarnings("rawtypes")
	@Override
	public boolean isSafe(File f) {
		boolean safeState = false;
		try {
			if ((f != null) && f.exists() && f.canRead()) {
				// Perform a first check on Word document format
				FileFormatInfo formatInfo = FileFormatUtil.detectFileFormat(f.getAbsolutePath());
				String formatExtension = FileFormatUtil.loadFormatToExtension(formatInfo.getLoadFormat());
				if ((formatExtension != null) && ALLOWED_FORMAT.contains(formatExtension.toLowerCase(Locale.US).replaceAll("\\.", ""))) {
					// Load the file into the Word document parser
					Document document = new Document(f.getAbsolutePath());
					// Get safe state from Macro presence
					safeState = !document.hasMacros();
					// If document is safe then we pass to OLE objects analysis
					if (safeState) {
						// Get all shapes of the document
						NodeCollection shapes = document.getChildNodes(NodeType.SHAPE, true);
						Shape shape = null;
						// Search OLE objects in all shapes
						int totalOLEObjectCount = 0;
						for (int i = 0; i < shapes.getCount(); i++) {
							shape = (Shape) shapes.get(i);
							// Check if the current shape has OLE object
							if (shape.getOleFormat() != null) {
								totalOLEObjectCount++;
							}
						}
						// Update safe status flag according to number of OLE object found
						if (totalOLEObjectCount != 0) {
							safeState = false;
						}

					}
				}
			}
		}
		catch (Exception e) {
			safeState = false;
			LOG.warn("Error during Word file analysis !", e);
		}
		return safeState;
	}

}


Case n°2: PDF

Detector for PDF document:

import java.io.File;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.itextpdf.text.pdf.PdfArray;
import com.itextpdf.text.pdf.PdfDictionary;
import com.itextpdf.text.pdf.PdfName;
import com.itextpdf.text.pdf.PdfReader;

/**
 * Implementation of the detector for Adobe PDF document.
 * 
 *
 */
public class PdfDocumentDetectorImpl implements DocumentDetector {

	/** LOGGER */
	private static final Logger LOG = LoggerFactory.getLogger(PdfDocumentDetectorImpl.class);

	/**
	 * {@inheritDoc}
	 *
	 * @see eu.righettod.poc.detector.DocumentDetector#isSafe(java.io.File)
	 */
	@Override
	public boolean isSafe(File f) {
		boolean safeState = false;
		try {
			if ((f != null) && f.exists()) {
				// Load stream in PDF parser
				// If the stream is not a PDF then exception will be throwed
				// here and safe state will be set to FALSE
				PdfReader reader = new PdfReader(f.getAbsolutePath());
				// Check 1:
				// Detect if the document contains any JavaScript code
				String jsCode = reader.getJavaScript();
				if (jsCode == null) {
					// OK no JS code then when pass to check 2:
					// Detect if the document has any embedded files
					PdfDictionary root = reader.getCatalog();
					PdfDictionary names = root.getAsDict(PdfName.NAMES);
					PdfArray namesArray = null;
					if (names != null) {
						PdfDictionary embeddedFiles = names.getAsDict(PdfName.EMBEDDEDFILES);
						namesArray = embeddedFiles.getAsArray(PdfName.NAMES);
					}
					// Get safe state from number of embedded files
					safeState = ((namesArray == null) || namesArray.isEmpty());
				}
			}
		} catch (Exception e) {
			safeState = false;
			LOG.warn("Error during Pdf file analysis !", e);
		}
		return safeState;
	}

}


Case n°3: Images

Sanitizer for Images files:

import ij.IJ;
import ij.ImagePlus;
import ij.io.Opener;
import ij.process.ImageProcessor;

import java.io.File;
import java.nio.file.Files;
import java.nio.file.StandardCopyOption;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/**
 * Implementation of the sanitizer for Image file.
 * 
 * @see "http://stackoverflow.com/a/24747085"
 *
 */
public class ImageDocumentSanitizerImpl implements DocumentSanitizer {

	/** LOGGER */
	private static final Logger LOG = LoggerFactory.getLogger(ImageDocumentSanitizerImpl.class);

	/**
	 * {@inheritDoc}
	 *
	 * @see eu.righettod.poc.sanitizer.DocumentSanitizer#madeSafe(java.io.File)
	 */
	@Override
	public boolean madeSafe(File f) {
		boolean safeState = false;
		try {
			if ((f != null) && f.exists() && f.canRead() && f.canWrite()) {
				// Load image
				ImagePlus image = new Opener().openImage(f.getAbsolutePath());

				// Check that image has been successfully loaded
				if (image == null) {
					throw new Exception("Cannot load the original image !");
				}

				// Get current Width and Height of the image
				int originalWidth = image.getWidth();
				int originalHeight = image.getHeight();

				// Obtain an Image processor on this image
				ImageProcessor originalImageProcessor = image.getProcessor();
				if (originalImageProcessor == null) {
					throw new Exception("Cannot obtains an image processor for the original image !");
				}

				// Resize the image by removing 1px on Width and Height
				ImageProcessor resizedImageProcessor = originalImageProcessor.resize(originalWidth - 1, originalHeight - 1);
				if (resizedImageProcessor == null) {
					throw new Exception("Cannot resize the original image !");
				}

				// Resize the resized image by adding 1px on Width and Height
				// In fact set image to is initial size
				ImageProcessor initialSizedImageProcessor = resizedImageProcessor.resize(originalWidth, originalHeight);
				if (initialSizedImageProcessor == null) {
					throw new Exception("Cannot restore the initial size of the original image !");
				}

				// Save image and detect the image format for provided file
				String imageFormat = Opener.getFileFormat(f.getAbsolutePath());
				ImagePlus finalImg = new ImagePlus("", initialSizedImageProcessor);
				IJ.saveAs(finalImg, imageFormat, f.getAbsolutePath());

				// IJ will save the file with the extension associated to the image format (ex: jpg or png)
				// but, as the provided input file can have any extension (we do not use it to detect image format),
				// then we must manage the case in which 2 files exists at this point:
				// 1) The input file provided (ex: myfile.tmp)
				// 2) The new file saved by IJ (ex: myfile.png)
				String tmp = f.getName();
				String newSavedFileName = tmp.substring(0, tmp.lastIndexOf(".") + 1) + imageFormat;
				File newSavedFile = new File(f.getParentFile(), newSavedFileName);
				if (newSavedFile.exists() && !f.getAbsolutePath().equalsIgnoreCase(newSavedFile.getAbsolutePath())) {
					// Overwrite content of the input file with the content of the new saved file
					Files.copy(newSavedFile.toPath(), f.toPath(), StandardCopyOption.REPLACE_EXISTING);
					// Remove file saved by IJ
					newSavedFile.delete();
				}

				// Set state flag
				safeState = true;
			}

		}
		catch (Exception e) {
			safeState = false;
			LOG.warn("Error during Image file processing !", e);
		}

		return safeState;
	}

}


Sources of the prototype

Github repository: https://github.com/righettod/document-upload-protection

Authors and Primary Editors

Dominique Righetto - dominique.righetto@owasp.org

Other Cheatsheets