Talk:Review Webserver Metafiles for Information Leakage (OTG-INFO-003)
It could be added that, from an attacker point of view, the robots.txt file can provide some useful information on the structure of the web server, e.g., directories that are supposed to be "private".
Marco 18:11, 17 August 2008 (EDT)
The intent of robots.txt is *not* to specify access control for directories. Hence to quote the wiki page "Web spiders/robots/crawlers can intentionally ignore the Disallow directives specified in a robots.txt file . Hence, robots.txt should not be considered as a mechanism to enforce restrictions on how web content is accessed, stored, or republished by third parties.".
If you believe this is not your communicated clearly or could be reworded then please amend the wiki page.
cmlh 12:34, 24 August 2008 (GMT +10)
v3 Review Comments
I don't see anything here about actually testing robots.txt or using Spiders/Robots/Crawlers to do anything to the web app. It's nice that we can DL the file and that it contains some interesting information and that there's a google tool that can do some analysis of it (though we haven't explained what google webmaster tools gives you or provided an example of the output), but where would that lead a tester or attacker?
Rick.mitchell 09:39, 3 September 2008 (EDT)
Reply from @cmlh
Rick may have overlooked the quote "Hence, robots.txt should not be considered as a mechanism to enforce restrictions on how web content is accessed, stored, or republished by third parties. " from in the "How to Test" section of v3.
The lack of the "Google Webmaster Tools" example is due to me not being the webmaster of owasp.org. This can be resolved in v4 once the webmaster is known.