Difference between revisions of "OWASP File Hash Repository"

From OWASP
Jump to: navigation, search
(Created page with "==== Main ==== ==== Project About ==== {{:Projects/OWASP File Hash Repository | Project About}} __NOTOC__ <headertabs /> Category:OWASP_Tool [[Category:OWASP_Alpha_Qu...")
 
Line 1: Line 1:
 
==== Main  ====
 
==== Main  ====
 +
 +
=FHR FAQ=
 +
 +
==What is FHR?==
 +
 +
Simply put, FHR is a repository of hashes of files. But the idea is to go beyond just keeping a list of hashes: I want the repository to indicate when the file in question is (part of) a malware or when a file is recognized as benign. Thus, anyone could see the hash of a file to see if it corresponds to a malware file or an already known good file.
 +
 +
==Aren't there already other sources for this information? ==
 +
 +
Yes, and one of the ideas of the project is to aggregate and leverage information from already existing sources. For example, NIST has the [http://www.nsrl.nist.gov NSRL], which provides hashes of known benign files. The problem is that NIST provides this information in a text file whose download is over 1GB in size. Other known sources are Team Cymru's [http://www.team-cymru.org/Services/MHR/ MHR], [http://isc.sans.edu/tools/hashsearch.html SANS Institute's hash database] and [http://www.virustotal.com Virus Total]. In addition to aggregating the information, one of the main goals for FHR is to allow free access to its database.
 +
 +
==Isn't free access to a database that contains malware dangerous?==
 +
Yes, it's dangerous, but the project repository will not contain malware. The repository will only have the hashes of malware, which poses no danger.
 +
 +
==Detecting malware using only hashes is not good strategy.==
 +
Certainly, and the project is not intended to replace the current anti-virus scanners. However, the creation of hashes is more efficient and easier than creating generic virus detection algorithms and it is a strategy which is being used as a complement to traditional antivirus products. Several commercial products include uses of cloud computing as part of their strategies. Unfortunately, the producers of these technologies do not allow queries to their hash databases. With FHR, the goal is to create a freely available database to be used by everyone.
 +
 +
==Will the FHR be integrated into antivirus systems?==
 +
We intend to develop clients to the FHR database that can scan workstations and query FHR's database to try to identify malware. These clients will be created as a proof of concept and will be open source. It would be great if some antivirus vendors start supporting FHR, but only time will tell.
 +
 +
==Technically, how does the FHR work?==
 +
As expected, the core of the system is its database of hashes. Today this database runs on MySQL. Around this database, we can develop several query interfaces. Some ideas of protocols for querying the FHR database are:
 +
 +
* DNS
 +
* web
 +
* web services
 +
* JSON
 +
 +
The current codebase includes a DNS-based query interface.
 +
 +
==What data are available in the database?==
 +
 +
We currently have the a little more than 20 million files in the database. These come mainly from the NSRL and we included several PE files from Windows Vista and other common software. For each registered file, we have the following information:
 +
 +
* SHA-1
 +
* MD5
 +
* source
 +
* date when the system saw the hash / file for the first time (not available for the files from NIST)
 +
* status (GOOD, MALWARE, UNKNOWN, SUSPICIOUS)
 +
* size
 +
* certainty (a percentage that indicates the degree of certainty about the status of the file).
 +
 +
==Testing the system==
 +
 +
We will soon integrate the system into the global DNS, so everyone can query the database through the DNS interface.
  
 
==== Project About  ====
 
==== Project About  ====

Revision as of 16:13, 23 October 2011

Main

FHR FAQ

What is FHR?

Simply put, FHR is a repository of hashes of files. But the idea is to go beyond just keeping a list of hashes: I want the repository to indicate when the file in question is (part of) a malware or when a file is recognized as benign. Thus, anyone could see the hash of a file to see if it corresponds to a malware file or an already known good file.

Aren't there already other sources for this information?

Yes, and one of the ideas of the project is to aggregate and leverage information from already existing sources. For example, NIST has the NSRL, which provides hashes of known benign files. The problem is that NIST provides this information in a text file whose download is over 1GB in size. Other known sources are Team Cymru's MHR, SANS Institute's hash database and Virus Total. In addition to aggregating the information, one of the main goals for FHR is to allow free access to its database.

Isn't free access to a database that contains malware dangerous?

Yes, it's dangerous, but the project repository will not contain malware. The repository will only have the hashes of malware, which poses no danger.

Detecting malware using only hashes is not good strategy.

Certainly, and the project is not intended to replace the current anti-virus scanners. However, the creation of hashes is more efficient and easier than creating generic virus detection algorithms and it is a strategy which is being used as a complement to traditional antivirus products. Several commercial products include uses of cloud computing as part of their strategies. Unfortunately, the producers of these technologies do not allow queries to their hash databases. With FHR, the goal is to create a freely available database to be used by everyone.

Will the FHR be integrated into antivirus systems?

We intend to develop clients to the FHR database that can scan workstations and query FHR's database to try to identify malware. These clients will be created as a proof of concept and will be open source. It would be great if some antivirus vendors start supporting FHR, but only time will tell.

Technically, how does the FHR work?

As expected, the core of the system is its database of hashes. Today this database runs on MySQL. Around this database, we can develop several query interfaces. Some ideas of protocols for querying the FHR database are:

  • DNS
  • web
  • web services
  • JSON

The current codebase includes a DNS-based query interface.

What data are available in the database?

We currently have the a little more than 20 million files in the database. These come mainly from the NSRL and we included several PE files from Windows Vista and other common software. For each registered file, we have the following information:

  • SHA-1
  • MD5
  • source
  • date when the system saw the hash / file for the first time (not available for the files from NIST)
  • status (GOOD, MALWARE, UNKNOWN, SUSPICIOUS)
  • size
  • certainty (a percentage that indicates the degree of certainty about the status of the file).

Testing the system

We will soon integrate the system into the global DNS, so everyone can query the database through the DNS interface.

Project About

PROJECT INFO
What does this OWASP project offer you?
RELEASE(S) INFO
What releases are available for this project?
what is this project?
Name: OWASP File Hash Repository (home page)
Purpose: The goal of this project is to build a repository of hashes of executable and source files. This repository can then be queried by clients to determine the status os of files based on their hashes. Some statuses are GOOD, MALWARE, SOURCE CHECKED, etc. This repository can consolidate several available sources (NIST, MHR, VirusTotal, etc) and provide better query capabilities.
License: Apache 2.0 License
who is working on this project?
Project Leader(s):
  • Alexandre Pupo @
how can you learn more?
Project Pamphlet: Not Yet Created
Project Presentation:
Mailing list: Mailing List Archives
Project Roadmap: View
Key Contacts
  • Contact Alexandre Pupo @ to contribute to this project
  • Contact Alexandre Pupo @ to review or sponsor this project
  • Contact the GPC to report a problem or concern about this project or to update information.
current release
Not Yet Published
last reviewed release
Not Yet Reviewed


other releases