Protect FileUpload Against Malicious File



{| style="padding: 0;margin:0;margin-top:10px;text-align:left;" |- Last revision (mm/dd/yy): // = Introduction =
 * valign="top" style="border-right: 1px dotted gray;padding-right:25px;" |

This article propose a way to protect a file upload feature against submission of file containing malicious code.

Context
Into web applications, when we expect upload of working documents from users, we can expose the application to submission of documents that we can categorize as malicious.

We use the term "malicious" here to refer to documents that embed malicious code that will be executed when another user (admin, back office operator...) will open the document with the associated application reader.

Usually, when an application expect his user to upload a document, the application expect to receive a document for which the intended use will be for reading/printing/archiving. The document should not alter is content at opening time and should be in a final rendered state.

The most common file types used to transmit malicious code into file upload feature are the following:


 * Microsoft Office document: Word/Excel/Powerpoint using VBA Macro and OLE package.
 * Adobe PDF document: Insert malicious code as attachment.
 * Images: Malicious code embedded into the file or use of binary file with image file extension.

Approaches
Based on this context, the goals here are:


 * For Word/Excel/Powerpoint/Pdf documents: Detect when a document contains "code"/OLE package, if it's the case then block the upload process.
 * For Images document: Sanitize incoming image using re-writing approach and then disable/remove any "code" present (this approach also handle case in which the file sent is not an image).

Remarks:


 * It's technically possible to perform sanitizing on Word/Excel/Powerpoint/PDF documents but we have choosen here the option to block them in order to avoid the risk of missing any evasion technics and then let pass one evil document. The following site show how many way exists to embed Macro into a Microsoft Office documents.
 * The other reason why we have choosen the blocking way is that for Word/Excel/Powerpoint, changing document format (for example by saving any document to DOCX/XSLX/PPTX/PPSX formats in order to be sure that no Macro can be executed) can have impacts or cause issues on document structure/rendering depending on the API used.

Common codes
The following codes are shared by the code snippets proposed into the rest of this article.

Interfaces:

DocumentDetector

DocumentSanitizer

Case n°1: Word / Excel / Powerpoint
The reason why Aspose API have been used here are the following:


 * There many way to embed Macro into a Microsoft Office document and, instead of manually support all the way that exists on the wild (they evolve every days), we prefer to use features from a company that perform R&D on these formats, precisely DOC/XLS/PPT native formats that are proprietary.
 * The open source API POI for DOC native format is limited.
 * The open source API JEXCELAPI for XLS native format is not often maintained (last publishing date from 2009).

Detector for Word document:

Detector for Excel document:

Detector for Powerpoint document:

Case n°2: PDF
Detector for PDF document:

Case n°3: Images
Sanitizer for Images files:

Sources of the prototype
Github repository: https://github.com/righettod/document-upload-protection

= Authors and Primary Editors =

Dominique Righetto - dominique.righetto@owasp.org

= Other Cheatsheets =


 * }