Difference between revisions of "Fingerprint Web Application (OTG-INFO-009)"

From OWASP
Jump to: navigation, search
 
(6 intermediate revisions by 3 users not shown)
Line 2: Line 2:
  
 
== Summary ==
 
== Summary ==
Web server fingerprinting is a critical task for the Penetration tester. Knowing the version and type of a running web server allows testers to determine known vulnerabilities and the appropriate exploits to use during testing.
 
  
There are several different vendors and versions of web servers on the market today. Knowing the type of web server that you are testing significantly helps in the testing process, and will also change the course of the test. This information can be derived by sending the web server specific commands and analyzing the output, as each version of web server software may respond differently to these commands. By knowing how each type of web server responds to specific commands and keeping this information in a web server fingerprint database, a penetration tester can send these commands to the web server, analyze the response, and compare it to the database of known signatures. Please note that it usually takes several different commands to accurately identify the web server, as different versions may react similarly to the same command. Rarely, however, different versions react the same to all HTTP commands. So, by sending several different commands, you increase the accuracy of your guess.
+
There is nothing new under the sun, and nearly every web application that one may think of developing has already been developed. With the vast number of free and open source software projects that are actively developed and deployed around the world, it is very likely that an application security test will face a target site that is entirely or partly dependent on these well known applications (e.g. Wordpress, phpBB, Mediawiki, etc). Knowing the web application components that are being tested significantly helps in the testing process and will also drastically reduce the effort required during the test. These well known web applications have known HTML headers, cookies, and directory structures that can be enumerated to identify the application.  
 +
 
  
 
== Test Objectives ==
 
== Test Objectives ==
  
Identify the version and type of the running web server to determine known vulnerabilities and the appropriate exploits to use during the testing.
+
Identify the web application and version to determine known vulnerabilities and the appropriate exploits to use during testing.
 +
 
  
 
== How to Test ==
 
== How to Test ==
  
=== Black Box testing and example ===
+
=== Cookies ===
The simplest and most basic form of identifying a Web server is to look at the Server field in the HTTP response header. For our experiments we use netcat.  
+
A relatively reliable way to identify a web application is by the application-specific cookies.
Consider the following HTTP Request-Response:  
+
 
 +
Consider the following HTTP-request:
 +
 
 
<pre>
 
<pre>
$ nc 202.41.76.251 80
+
GET / HTTP/1.1
HEAD / HTTP/1.0
+
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0
 +
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
 +
Accept-Language: en-US,en;q=0.5
 +
'''Cookie: wp-settings-time-1=1406093286; wp-settings-time-2=1405988284'''
 +
DNT: 1
 +
Connection: keep-alive
 +
Host: blog.owasp.org
 +
</pre>
 +
 
 +
The cookie ''CAKEPHP'' has automatically been set, which gives information about the framework being used. List of common cookies names is presented in Cpmmon Application Identifiers section. However, it is possible to change the name of the cookie.  
  
HTTP/1.1 200 OK
 
Date: Mon, 16 Jun 2003 02:53:29 GMT
 
Server: Apache/1.3.3 (Unix)  (Red Hat/Linux)
 
Last-Modified: Wed, 07 Oct 1998 11:18:14 GMT
 
ETag: "1813-49b-361b4df6"
 
Accept-Ranges: bytes
 
Content-Length: 1179
 
Connection: close
 
Content-Type: text/html
 
</pre>
 
  
From the ''Server'' field, we understand that the server is likely Apache, version 1.3.3, running on Linux operating system.
+
=== HTML source code ===
 +
This technique is based on finding certain patterns in the HTML page source code. Often one can find a lot of information which helps a tester to recognize a specific web application. One of the common markers are HTML comments that directly lead to application disclosure. More often certain application-specific paths can be found, i.e. links to application-specific css and/or js folders. Finally, specific script variables might also point to a certain application.
  
Four examples of the HTTP response headers are shown below.
+
From the meta tag below, one can easily learn the application used by a website and its version. The comment, specific paths and script variables can all help an attacker to quickly determine an instance of an application.
  
From an '''Apache 1.3.23''' server:
 
 
<pre>
 
<pre>
HTTP/1.1 200 OK
+
<meta name="generator" content="WordPress 3.9.2" />
Date: Sun, 15 Jun 2003 17:10: 49 GMT
 
Server: Apache/1.3.23
 
Last-Modified: Thu, 27 Feb 2003 03:48: 19 GMT
 
ETag: 32417-c4-3e5d8a83
 
Accept-Ranges: bytes
 
Content-Length: 196
 
Connection: close
 
Content-Type: text/HTML
 
 
</pre>
 
</pre>
  
From a '''Microsoft IIS 5.0''' server:
+
More frequently such information is placed between <head></head> tags, in <meta> tags or at the end of the page. Nevertheless, it is recommended to check the whole document since it can be useful for other purposes such as inspection of other useful comments and hidden fields.  
<pre>
+
 
HTTP/1.1 200 OK
+
=== Specific files and folders ===
Server: Microsoft-IIS/5.0
+
Apart from information gathered from HTML sources, there is another approach which greatly helps an attacker to determine the application with high accuracy. Every application has its own specific file and folder structure on the server. It has been pointed out that one can see the specific path from the HTML page source but sometimes they are not explicitly presented there and still reside on the server.
Expires: Yours, 17 Jun 2003 01:41: 33 GMT
 
Date: Mon, 16 Jun 2003 01:41: 33 GMT
 
Content-Type: text/HTML  
 
Accept-Ranges: bytes
 
Last-Modified: Wed, 28 May 2003 15:32: 21 GMT
 
ETag: b0aac0542e25c31: 89d
 
Content-Length: 7369
 
</pre>
 
  
From a '''Netscape Enterprise 4.1''' server:
+
In order to uncover them a technique known as dirbusting is used. Dirbusting is brute forcing a target with predictable folder and file names and monitoring HTTP-responses to emumerate server contents. This information can be used both for finding default files and attacking them, and for fingerprinting the web application. Dirbusting can be done in several ways, the example below shows a successful dirbusting attack against a WordPress-powered target with the help of defined list and intruder functionality of Burp Suite.
<pre>
 
HTTP/1.1 200 OK
 
Server: Netscape-Enterprise/4.1
 
Date: Mon, 16 Jun 2003 06:19: 04 GMT
 
Content-type: text/HTML
 
Last-modified: Wed, 31 Jul 2002 15:37: 56 GMT
 
Content-length: 57
 
Accept-ranges: bytes
 
Connection: close
 
</pre>
 
  
From a '''SunONE 6.1''' server:
+
[[Image:Wordpress_dirbusting.png]]
<pre>
 
HTTP/1.1 200 OK
 
Server: Sun-ONE-Web-Server/6.1
 
Date: Tue, 16 Jan 2007 14:53:45 GMT
 
Content-length: 1186
 
Content-type: text/html
 
Date: Tue, 16 Jan 2007 14:50:31 GMT
 
Last-Modified: Wed, 10 Jan 2007 09:58:26 GMT
 
Accept-Ranges: bytes
 
Connection: close
 
</pre>
 
However, this testing methodology is not so good. There are several techniques that allow a web site to obfuscate or to modify the server banner string.
 
For example we could obtain the following answer:
 
<pre>
 
403 HTTP/1.1 Forbidden
 
Date: Mon, 16 Jun 2003 02:41: 27 GMT
 
Server: Unknown-Webserver/1.0
 
Connection: close
 
Content-Type: text/HTML; charset=iso-8859-1
 
</pre>
 
  
In this case, the server field of that response is obfuscated: we cannot know what type of web server is running.
+
We can see that for some WordPress-specific folders (for instance, /wp-includes/, /wp-admin/ and /wp-content/) HTTP-reponses are 403 (Forbidden), 302 (Found, redirection to wp-login.php) and 200 (OK) respectively. This is a good indicator that the target is WordPress-powered. The same way it is possible to dirbust different application plugin folders and their versions. On the screenshot below one can see a typical CHANGELOG file of a Drupal plugin, which provides information on the application being used and discloses a vulnerable plugin version.
  
==== Protocol behaviour ====
+
[[Image:Drupal_botcha_disclosure.png]]
More refined techniques take in consideration various characteristics of the several web servers available on the market. We will list some methodologies that allow us to deduce the type of web server in use.
 
  
'''HTTP header field ordering'''
+
Tip: before starting dirbusting, it is recommended to check the robots.txt file first. Sometimes application specific folders and other sensitive information can be found there as well. An example of such a robots.txt file is presented on a screenshot below.
  
The first method consists of observing the ordering of the several headers in the response. Every web server has an inner ordering of the header. We consider the following answers as an example:
+
[[Image:Robots-info-disclosure.png]]
  
Response from '''Apache 1.3.23'''
+
Specific files and folders are different for each specific application. It is recommended to install the corresponding application during penetration tests in order to have better understanding of what infrastructure is presented and what files might be left on the server. However, several good file lists already exist and one good example is FuzzDB wordlists of predictable files/folders (http://code.google.com/p/fuzzdb/).
<pre>
 
$ nc apache.example.com 80
 
HEAD / HTTP/1.0
 
  
HTTP/1.1 200 OK
+
== Common Application Identifiers ==
Date: Sun, 15 Jun 2003 17:10: 49 GMT
+
=== Cookies ===
Server: Apache/1.3.23
 
Last-Modified: Thu, 27 Feb 2003 03:48: 19 GMT
 
ETag: 32417-c4-3e5d8a83
 
Accept-Ranges: bytes
 
Content-Length: 196
 
Connection: close
 
Content-Type: text/HTML
 
</pre>
 
Response from '''IIS 5.0'''
 
<pre>
 
$ nc iis.example.com 80
 
HEAD / HTTP/1.0
 
  
HTTP/1.1 200 OK
+
{| class="wikitable"
Server: Microsoft-IIS/5.0
+
|-
Content-Location: http://iis.example.com/Default.htm
+
| phpBB || phpbb3_
Date: Fri, 01 Jan 1999 20:13: 52 GMT
+
|-
Content-Type: text/HTML
+
| Wordpress || wp-settings
Accept-Ranges: bytes
+
|-
Last-Modified: Fri, 01 Jan 1999 20:13: 52 GMT
+
| 1C-Bitrix || BITRIX_
ETag: W/e0d362a4c335be1: ae1
+
|-
Content-Length: 133
+
| AMPcms || AMP
</pre>
+
|-
Response from '''Netscape Enterprise 4.1'''
+
| Django CMS || django
<pre>
+
|-
$ nc netscape.example.com 80
+
| DotNetNuke || DotNetNukeAnonymous
HEAD / HTTP/1.0
+
|-
 +
| e107 || e107_tz
 +
|-
 +
| EPiServer || EPiTrace, EPiServer
 +
|-
 +
| Graffiti CMS || graffitibot
 +
|-
 +
| Hotaru CMS || hotaru_mobile
 +
|-
 +
| ImpressCMS || ICMSession
 +
|-
 +
| Indico || MAKACSESSION
 +
|-
 +
| InstantCMS || InstantCMS[logdate]
 +
|-
 +
| Kentico CMS || CMSPreferredCulture
 +
|-
 +
| MODx || SN4[12symb]
 +
|-
 +
| TYPO3 || fe_typo_user
 +
|-
 +
| Dynamicweb || Dynamicweb
 +
|-
 +
| LEPTON || lep[some_numeric_value]+sessionid
 +
|-
 +
| Wix || Domain=.wix.com
 +
|-
 +
| VIVVO || VivvoSessionId
 +
|}
  
HTTP/1.1 200 OK
 
Server: Netscape-Enterprise/4.1
 
Date: Mon, 16 Jun 2003 06:01: 40 GMT
 
Content-type: text/HTML
 
Last-modified: Wed, 31 Jul 2002 15:37: 56 GMT
 
Content-length: 57
 
Accept-ranges: bytes
 
Connection: close
 
</pre>
 
Response from a '''SunONE 6.1'''
 
<pre>
 
$ nc sunone.example.com 80
 
HEAD / HTTP/1.0
 
  
HTTP/1.1 200 OK
+
=== HTML source code ===
Server: Sun-ONE-Web-Server/6.1
 
Date: Tue, 16 Jan 2007 15:23:37 GMT
 
Content-length: 0
 
Content-type: text/html
 
Date: Tue, 16 Jan 2007 15:20:26 GMT
 
Last-Modified: Wed, 10 Jan 2007 09:58:26 GMT
 
Connection: close
 
</pre>
 
We can notice that the ordering of the ''Date'' field and the ''Server'' field differs between Apache, Netscape Enterprise, and IIS.
 
  
'''Malformed requests test'''
+
{| class="wikitable"
 +
|-
 +
! Application !! Keyword
 +
|-
 +
| Wordpress || <meta name="generator" content="WordPress 3.9.2" />
 +
|-
 +
| phpBB || <body id="phpbb"
 +
|-
 +
| Mediawiki || <meta name="generator" content="MediaWiki 1.21.9" />
 +
|-
 +
| Joomla || <meta name="generator" content="Joomla! - Open Source Content Management" />
 +
|-
 +
| Drupal || <meta name="Generator" content="Drupal 7 (http://drupal.org)" />
 +
|-
 +
| DotNetNuke || DNN Platform - http://www.dnnsoftware.com
 +
|}
  
Another useful test to execute involves sending malformed requests or requests of nonexistent pages to the server.
+
More info https://www.owasp.org/index.php/Web-metadata
Consider the following HTTP responses.  
 
  
Response from '''Apache 1.3.23'''
+
== Tools ==
<pre>
+
A list of general and well-known tools is presented below. There are also a lot of other utilities, as well as framework-based fingerprinting tools.
$ nc apache.example.com 80
+
GET / HTTP/3.0
 
  
HTTP/1.1 400 Bad Request
+
=== WhatWeb ===
Date: Sun, 15 Jun 2003 17:12: 37 GMT
+
Website: http://www.morningstarsecurity.com/research/whatweb <br>
Server: Apache/1.3.23
+
Currently one of the best fingerprinting tools on the market. Included in a default [[Kali Linux]] build.
Connection: close
+
Language: Ruby
Transfer: chunked
+
Matches for fingerprinting are made with:
Content-Type: text/HTML; charset=iso-8859-1
+
* Text strings (case sensitive)
</pre>
+
* Regular expressions
Response from '''IIS 5.0'''
+
* Google Hack Database queries (limited set of keywords)
<pre>
+
* MD5 hashes
$ nc iis.example.com 80
+
* URL recognition
GET / HTTP/3.0
+
* HTML tag patterns
 +
* Custom ruby code for passive and aggressive operations
  
HTTP/1.1 200 OK
 
Server: Microsoft-IIS/5.0
 
Content-Location: http://iis.example.com/Default.htm
 
Date: Fri, 01 Jan 1999 20:14: 02 GMT
 
Content-Type: text/HTML
 
Accept-Ranges: bytes
 
Last-Modified: Fri, 01 Jan 1999 20:14: 02 GMT
 
ETag: W/e0d362a4c335be1: ae1
 
Content-Length: 133
 
</pre>
 
Response from '''Netscape Enterprise 4.1'''
 
<pre>
 
$ nc netscape.example.com 80
 
GET / HTTP/3.0
 
  
HTTP/1.1 505 HTTP Version Not Supported
+
Sample output is presented on a screenshot below:
Server: Netscape-Enterprise/4.1
 
Date: Mon, 16 Jun 2003 06:04: 04 GMT
 
Content-length: 140
 
Content-type: text/HTML
 
Connection: close
 
</pre>
 
Response from a '''SunONE 6.1'''
 
<pre>
 
$ nc sunone.example.com 80
 
GET / HTTP/3.0
 
  
HTTP/1.1 400 Bad request
+
[[Image:whatweb-sample.png]]
Server: Sun-ONE-Web-Server/6.1
 
Date: Tue, 16 Jan 2007 15:25:00 GMT
 
Content-length: 0
 
Content-type: text/html
 
Connection: close
 
</pre>
 
We notice that every server answers in a different way. The answer also differs in the version of the server. Similar observations can be done we create requests with a non-existent protocol. Consider the following responses:
 
  
Response from '''Apache 1.3.23'''
 
<pre>
 
$ nc apache.example.com 80
 
GET / JUNK/1.0
 
  
HTTP/1.1 200 OK
 
Date: Sun, 15 Jun 2003 17:17: 47 GMT
 
Server: Apache/1.3.23
 
Last-Modified: Thu, 27 Feb 2003 03:48: 19 GMT
 
ETag: 32417-c4-3e5d8a83
 
Accept-Ranges: bytes
 
Content-Length: 196
 
Connection: close
 
Content-Type: text/HTML
 
</pre>
 
Response from '''IIS 5.0'''
 
<pre>
 
$ nc iis.example.com 80
 
GET / JUNK/1.0
 
  
HTTP/1.1 400 Bad Request
+
=== BlindElephant ===
Server: Microsoft-IIS/5.0
+
Website: https://community.qualys.com/community/blindelephant <br>
Date: Fri, 01 Jan 1999 20:14: 34 GMT
+
This great tool works on the principle of static file checksum based version difference thus providing a very high quality of fingerprinting.
Content-Type: text/HTML
+
Language: Python
Content-Length: 87
 
</pre>
 
Response from '''Netscape Enterprise 4.1'''
 
<pre>
 
$ nc netscape.example.com 80
 
GET / JUNK/1.0
 
  
<HTML><HEAD><TITLE>Bad request</TITLE></HEAD>
+
Sample output of a successful fingerprint:
<BODY><H1>Bad request</H1>
 
Your browser sent to query this server could not understand.
 
</BODY></HTML>
 
</pre>
 
Response from a '''SunONE 6.1'''
 
 
<pre>
 
<pre>
$ nc sunone.example.com 80
+
pentester$ python BlindElephant.py http://my_target drupal
GET / JUNK/1.0
+
Loaded /Library/Python/2.7/site-packages/blindelephant/dbs/drupal.pkl with 145 versions, 478 differentiating paths, and 434 version groups.
 +
Starting BlindElephant fingerprint for version of drupal at http://my_target
  
<HTML><HEAD><TITLE>Bad request</TITLE></HEAD>
+
Hit http://my_target/CHANGELOG.txt
<BODY><H1>Bad request</H1>
+
File produced no match. Error: Retrieved file doesn't match known fingerprint. 527b085a3717bd691d47713dff74acf4
Your browser sent a query this server could not understand.
 
</BODY></HTML>
 
</pre>
 
  
== Tools ==
+
Hit http://my_target/INSTALL.txt
* httprint - http://net-square.com/httprint.html
+
File produced no match. Error: Retrieved file doesn't match known fingerprint. 14dfc133e4101be6f0ef5c64566da4a4
* httprecon - http://www.computec.ch/projekte/httprecon/
 
* Netcraft - http://www.netcraft.com
 
* Desenmascarame - http://desenmascara.me
 
* Shodan - http://www.shodanhq.com
 
* Nmap - http://nmap.org
 
  
=== Automated Testing ===
+
Hit http://my_target/misc/drupal.js
Rather than rely on manual bannering and analysis of the web server headers, a tester can use automated tools to achieve the same purpose. The tests to carry out in order to accurately fingerprint a web server can be many. Luckily, there are tools that automate these tests. "''httprint''" is one of such tools. httprint has a signature dictionary that allows one to recognize the type and the version of the web server in use.<br>
+
Possible versions based on result: 7.12, 7.13, 7.14
An example of running httprint is shown below:<br><br>
 
  
[[Image:httprint.jpg |800px|]]
+
Hit http://my_target/MAINTAINERS.txt
 +
File produced no match. Error: Retrieved file doesn't match known fingerprint. 36b740941a19912f3fdbfcca7caa08ca
  
 +
Hit http://my_target/themes/garland/style.css
 +
Possible versions based on result: 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 7.10, 7.11, 7.12, 7.13, 7.14
  
[http://www.nmap.org Nmap] version detection offers a lot of advanced features that can help in determining services that are running on a given host, it obtains all data by connecting to open ports and interrogating them by using probes that the specific services understand, the following example shows how Nmap connected to port 80 in order to fingerprint the service and its current version
+
...
  
<pre>
+
Fingerprinting resulted in:
localhost$ nmap -sV example.com
+
7.14
Starting Nmap 6.40 ( http://nmap.org ) at 2013-09-21 13:20 GST
 
Nmap scan report for example.com (127.0.0.1)
 
Host is up (0.028s latency).
 
Not shown: 997 filtered ports
 
PORT    STATE  SERVICE    VERSION
 
80/tcp  open  http      Microsoft IIS httpd 6.0
 
Service Info: OS: Windows; CPE: cpe:/o:microsoft:windows
 
</pre>
 
  
=== Online Testing ===
 
Online tools can be used if the tester wishes to test more stealthily and doesn't wish to directly connect to the target website. An example of online tool that often delivers a lot of information on target Web Server, are [http://www.netcraft.com Netcraft] and [http://www.shodanhq.com SHODAN]
 
  
With [http://www.netcraft.com Netcraft] we can retrieve information about operating system, web server used, Server Uptime, Netblock Owner, history of change related to Web server and O.S.<br> An example is shown below:
+
Best Guess: 7.14
<br><br>
+
</pre>
  
[[Image:netcraft2.png |800px|]]
 
  
 +
=== Wappalyzer ===
 +
Website: http://wappalyzer.com <br>
 +
Wapplyzer is a Firefox Chrome plug-in. It works only on regular expression matching and doesn't need anything other than the page to be loaded on browser. It works completely at the browser level and gives results in the form of icons. Although sometimes it has false positives, this is very handy to have notion of what technologies were used to construct a target website immediately after browsing a page.
  
[http://www.shodanhp.com SHODAN] combines an HTTP port scanner with a search engine index of the HTTP responses, making it trivial to find specific web servers. Shodan collects data mostly on web servers at the moment (HTTP port 80), but there is also some data from FTP (21), SSH (22) Telnet (23), SNMP (161) and SIP (5060) services. <br> An example is shown below:
 
<br><br>
 
  
[[File:Shodan.png |800px|]]
+
Sample output of a plug-in is presented on a screenshot below.
  
 +
[[Image:Owasp-wappalyzer.png]]
  
[[OWASP Unmaskme Project]] expect becomes another online tool to do fingerprinting in any website with an overall interpretation of all the [[Web-metadata]] extracted. The idea behind this project is that anyone in charge of a website could test the metadata their site is showing to the world and assess it from a security point of view.
 
While this project is being developed, you can test a [http://desenmascara.me/ Spanish Proof of Concept of this idea].
 
  
== Vulnerability References ==
+
== References ==
 
'''Whitepapers'''<br>
 
'''Whitepapers'''<br>
 
* Saumil Shah: "An Introduction to HTTP fingerprinting" - http://www.net-square.com/httprint_paper.html
 
* Saumil Shah: "An Introduction to HTTP fingerprinting" - http://www.net-square.com/httprint_paper.html
 
* Anant Shrivastava : "Web Application Finger Printing" - http://anantshri.info/articles/web_app_finger_printing.html
 
* Anant Shrivastava : "Web Application Finger Printing" - http://anantshri.info/articles/web_app_finger_printing.html
* Nmap "Service and Application Version Detection" - http://nmap.org/book/vscan.html
+
 
  
 
== Remediation ==
 
== Remediation ==
 +
The general advice is to use several of the tools described above and check logs to better understand what exactly helps an attacker to disclose the web framework. By performing multiple scans after changes have been made to hide framework tracks, it's possible to achieve a better level of security and to make sure of the framework can not be detected by automatic scans. Below are some specific recommendations by framework marker location and some additional interesting approaches.
 +
 +
 +
==== HTTP headers ====
 +
Check the configuration and disable or obfuscate all HTTP-headers that disclose information the technologies used. Here is an interesting article about HTTP-headers obfuscation using Netscaler:
 +
http://grahamhosking.blogspot.ru/2013/07/obfuscating-http-header-using-netscaler.html
 +
 +
 +
==== Cookies ====
 +
It is recommended to change cookie names by making changes in the corresponding configuration files.
 +
 +
 +
==== HTML source code ====
 +
Manually check the contents of the HTML code and remove everything that explicitly points to the framework.
 +
 +
General guidelines:
 +
*Make sure there are no visual markers disclosing the framework
 +
*Remove any unnecessary comments (copyrights, bug information, specific framework comments)
 +
*Remove META and generator tags
 +
*Use the companies own css or js files and do not store those in a framework-specific folders
 +
*Do not use default scripts on the page or obfuscate them if they must be used.
 +
 +
 +
==== Specific files and folders ====
 +
General guidelines:
 +
*Remove any unnecessary or unused files on the server. This implies text files disclosing information about versions and installation too.
 +
*Restrict access to other files in order to achieve 404-response when accessing them from outside. This can be done, for example, by modifying htaccess file and adding RewriteCond or RewriteRule there. An example of such restriction for two common WordPress folders is presented below.
 +
<pre>
 +
RewriteCond %{REQUEST_URI} /wp-login\.php$ [OR]
 +
RewriteCond %{REQUEST_URI} /wp-admin/$
 +
RewriteRule $ /http://your_website [R=404,L]
 +
</pre>
 +
 +
 +
However, these are not the only ways to restrict access. In order to automate this process, certain framework-specific plugins exist. One example for WordPress is StealthLogin (http://wordpress.org/plugins/stealth-login-page).
  
Protect the presentation layer web server behind a hardened reverse proxy.
 
  
Obfuscate the presentation layer web server headers.
+
==== Additional approaches ====
* Apache
+
General guidelines:
* IIS
+
*Checksum management
 +
*:The purpose of this approach is to beat checksum-based scanners and not let them disclose files by their hashes. Generally, there are two approaches in checksum management:
 +
*:*Change the location of where those files are placed (i.e. move them to another folder, or rename the existing folder)
 +
*:*Modify the contents - even slight modification results in a completely different hash sum, so adding a single byte in the end of the file should not be a big problem.
 +
*Controlled chaos
 +
*:A funny and effective method that involves adding bogus files and folders from other frameworks in order to fool scanners and confuse an attacker. But be careful not to overwrite existing files and folders and to break the current framework!

Latest revision as of 08:35, 21 July 2015

This article is part of the new OWASP Testing Guide v4.
Back to the OWASP Testing Guide v4 ToC: https://www.owasp.org/index.php/OWASP_Testing_Guide_v4_Table_of_Contents Back to the OWASP Testing Guide Project: https://www.owasp.org/index.php/OWASP_Testing_Project

Summary

There is nothing new under the sun, and nearly every web application that one may think of developing has already been developed. With the vast number of free and open source software projects that are actively developed and deployed around the world, it is very likely that an application security test will face a target site that is entirely or partly dependent on these well known applications (e.g. Wordpress, phpBB, Mediawiki, etc). Knowing the web application components that are being tested significantly helps in the testing process and will also drastically reduce the effort required during the test. These well known web applications have known HTML headers, cookies, and directory structures that can be enumerated to identify the application.


Test Objectives

Identify the web application and version to determine known vulnerabilities and the appropriate exploits to use during testing.


How to Test

Cookies

A relatively reliable way to identify a web application is by the application-specific cookies.

Consider the following HTTP-request:

GET / HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
'''Cookie: wp-settings-time-1=1406093286; wp-settings-time-2=1405988284'''
DNT: 1
Connection: keep-alive
Host: blog.owasp.org

The cookie CAKEPHP has automatically been set, which gives information about the framework being used. List of common cookies names is presented in Cpmmon Application Identifiers section. However, it is possible to change the name of the cookie.


HTML source code

This technique is based on finding certain patterns in the HTML page source code. Often one can find a lot of information which helps a tester to recognize a specific web application. One of the common markers are HTML comments that directly lead to application disclosure. More often certain application-specific paths can be found, i.e. links to application-specific css and/or js folders. Finally, specific script variables might also point to a certain application.

From the meta tag below, one can easily learn the application used by a website and its version. The comment, specific paths and script variables can all help an attacker to quickly determine an instance of an application.

<meta name="generator" content="WordPress 3.9.2" />

More frequently such information is placed between <head></head> tags, in <meta> tags or at the end of the page. Nevertheless, it is recommended to check the whole document since it can be useful for other purposes such as inspection of other useful comments and hidden fields.

Specific files and folders

Apart from information gathered from HTML sources, there is another approach which greatly helps an attacker to determine the application with high accuracy. Every application has its own specific file and folder structure on the server. It has been pointed out that one can see the specific path from the HTML page source but sometimes they are not explicitly presented there and still reside on the server.

In order to uncover them a technique known as dirbusting is used. Dirbusting is brute forcing a target with predictable folder and file names and monitoring HTTP-responses to emumerate server contents. This information can be used both for finding default files and attacking them, and for fingerprinting the web application. Dirbusting can be done in several ways, the example below shows a successful dirbusting attack against a WordPress-powered target with the help of defined list and intruder functionality of Burp Suite.

Wordpress dirbusting.png

We can see that for some WordPress-specific folders (for instance, /wp-includes/, /wp-admin/ and /wp-content/) HTTP-reponses are 403 (Forbidden), 302 (Found, redirection to wp-login.php) and 200 (OK) respectively. This is a good indicator that the target is WordPress-powered. The same way it is possible to dirbust different application plugin folders and their versions. On the screenshot below one can see a typical CHANGELOG file of a Drupal plugin, which provides information on the application being used and discloses a vulnerable plugin version.

Drupal botcha disclosure.png

Tip: before starting dirbusting, it is recommended to check the robots.txt file first. Sometimes application specific folders and other sensitive information can be found there as well. An example of such a robots.txt file is presented on a screenshot below.

Robots-info-disclosure.png

Specific files and folders are different for each specific application. It is recommended to install the corresponding application during penetration tests in order to have better understanding of what infrastructure is presented and what files might be left on the server. However, several good file lists already exist and one good example is FuzzDB wordlists of predictable files/folders (http://code.google.com/p/fuzzdb/).

Common Application Identifiers

Cookies

phpBB phpbb3_
Wordpress wp-settings
1C-Bitrix BITRIX_
AMPcms AMP
Django CMS django
DotNetNuke DotNetNukeAnonymous
e107 e107_tz
EPiServer EPiTrace, EPiServer
Graffiti CMS graffitibot
Hotaru CMS hotaru_mobile
ImpressCMS ICMSession
Indico MAKACSESSION
InstantCMS InstantCMS[logdate]
Kentico CMS CMSPreferredCulture
MODx SN4[12symb]
TYPO3 fe_typo_user
Dynamicweb Dynamicweb
LEPTON lep[some_numeric_value]+sessionid
Wix Domain=.wix.com
VIVVO VivvoSessionId


HTML source code

Application Keyword
Wordpress <meta name="generator" content="WordPress 3.9.2" />
phpBB <body id="phpbb"
Mediawiki <meta name="generator" content="MediaWiki 1.21.9" />
Joomla <meta name="generator" content="Joomla! - Open Source Content Management" />
Drupal <meta name="Generator" content="Drupal 7 (http://drupal.org)" />
DotNetNuke DNN Platform - http://www.dnnsoftware.com

More info https://www.owasp.org/index.php/Web-metadata

Tools

A list of general and well-known tools is presented below. There are also a lot of other utilities, as well as framework-based fingerprinting tools.


WhatWeb

Website: http://www.morningstarsecurity.com/research/whatweb
Currently one of the best fingerprinting tools on the market. Included in a default Kali Linux build. Language: Ruby Matches for fingerprinting are made with:

  • Text strings (case sensitive)
  • Regular expressions
  • Google Hack Database queries (limited set of keywords)
  • MD5 hashes
  • URL recognition
  • HTML tag patterns
  • Custom ruby code for passive and aggressive operations


Sample output is presented on a screenshot below:

Whatweb-sample.png


BlindElephant

Website: https://community.qualys.com/community/blindelephant
This great tool works on the principle of static file checksum based version difference thus providing a very high quality of fingerprinting. Language: Python

Sample output of a successful fingerprint:

pentester$ python BlindElephant.py http://my_target drupal
Loaded /Library/Python/2.7/site-packages/blindelephant/dbs/drupal.pkl with 145 versions, 478 differentiating paths, and 434 version groups.
Starting BlindElephant fingerprint for version of drupal at http://my_target 

Hit http://my_target/CHANGELOG.txt
File produced no match. Error: Retrieved file doesn't match known fingerprint. 527b085a3717bd691d47713dff74acf4 

Hit http://my_target/INSTALL.txt
File produced no match. Error: Retrieved file doesn't match known fingerprint. 14dfc133e4101be6f0ef5c64566da4a4 

Hit http://my_target/misc/drupal.js
Possible versions based on result: 7.12, 7.13, 7.14

Hit http://my_target/MAINTAINERS.txt
File produced no match. Error: Retrieved file doesn't match known fingerprint. 36b740941a19912f3fdbfcca7caa08ca 

Hit http://my_target/themes/garland/style.css
Possible versions based on result: 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 7.10, 7.11, 7.12, 7.13, 7.14

...

Fingerprinting resulted in:
7.14


Best Guess: 7.14


Wappalyzer

Website: http://wappalyzer.com
Wapplyzer is a Firefox Chrome plug-in. It works only on regular expression matching and doesn't need anything other than the page to be loaded on browser. It works completely at the browser level and gives results in the form of icons. Although sometimes it has false positives, this is very handy to have notion of what technologies were used to construct a target website immediately after browsing a page.


Sample output of a plug-in is presented on a screenshot below.

Owasp-wappalyzer.png


References

Whitepapers


Remediation

The general advice is to use several of the tools described above and check logs to better understand what exactly helps an attacker to disclose the web framework. By performing multiple scans after changes have been made to hide framework tracks, it's possible to achieve a better level of security and to make sure of the framework can not be detected by automatic scans. Below are some specific recommendations by framework marker location and some additional interesting approaches.


HTTP headers

Check the configuration and disable or obfuscate all HTTP-headers that disclose information the technologies used. Here is an interesting article about HTTP-headers obfuscation using Netscaler: http://grahamhosking.blogspot.ru/2013/07/obfuscating-http-header-using-netscaler.html


Cookies

It is recommended to change cookie names by making changes in the corresponding configuration files.


HTML source code

Manually check the contents of the HTML code and remove everything that explicitly points to the framework.

General guidelines:

  • Make sure there are no visual markers disclosing the framework
  • Remove any unnecessary comments (copyrights, bug information, specific framework comments)
  • Remove META and generator tags
  • Use the companies own css or js files and do not store those in a framework-specific folders
  • Do not use default scripts on the page or obfuscate them if they must be used.


Specific files and folders

General guidelines:

  • Remove any unnecessary or unused files on the server. This implies text files disclosing information about versions and installation too.
  • Restrict access to other files in order to achieve 404-response when accessing them from outside. This can be done, for example, by modifying htaccess file and adding RewriteCond or RewriteRule there. An example of such restriction for two common WordPress folders is presented below.
RewriteCond %{REQUEST_URI} /wp-login\.php$ [OR]
RewriteCond %{REQUEST_URI} /wp-admin/$
RewriteRule $ /http://your_website [R=404,L]


However, these are not the only ways to restrict access. In order to automate this process, certain framework-specific plugins exist. One example for WordPress is StealthLogin (http://wordpress.org/plugins/stealth-login-page).


Additional approaches

General guidelines:

  • Checksum management
    The purpose of this approach is to beat checksum-based scanners and not let them disclose files by their hashes. Generally, there are two approaches in checksum management:
    • Change the location of where those files are placed (i.e. move them to another folder, or rename the existing folder)
    • Modify the contents - even slight modification results in a completely different hash sum, so adding a single byte in the end of the file should not be a big problem.
  • Controlled chaos
    A funny and effective method that involves adding bogus files and folders from other frameworks in order to fool scanners and confuse an attacker. But be careful not to overwrite existing files and folders and to break the current framework!