Talk:Review Webserver Metafiles for Information Leakage (OTG-INFO-003)

Discussion
It could be added that, from an attacker point of view, the robots.txt file can provide some useful information on the structure of the web server, e.g., directories that are supposed to be "private".

Marco 18:11, 17 August 2008 (EDT)

The intent of robots.txt is *not* to specify access control for directories. Hence to quote the wiki page "Web spiders/robots/crawlers can intentionally ignore the Disallow directives specified in a robots.txt file [3]. Hence, robots.txt should not be considered as a mechanism to enforce restrictions on how web content is accessed, stored, or republished by third parties.".

If you believe this is not your communicated clearly or could be reworded then please amend the wiki page.

cmlh 12:34, 24 August 2008 (GMT +10)

v3 Review Comments
I don't see anything here about actually testing robots.txt or using Spiders/Robots/Crawlers to do anything to the web app. It's nice that we can DL the file and that it contains some interesting information and that there's a google tool that can do some analysis of it (though we haven't explained what google webmaster tools gives you or provided an example of the output), but where would that lead a tester or attacker? Rick.mitchell 09:39, 3 September 2008 (EDT)

Reply from @cmlh
Rick may have overlooked the quote "Hence, robots.txt should not be considered as a mechanism to enforce restrictions on how web content is accessed, stored, or republished by third parties. " from in the "How to Test" section of v3.

The lack of the "Google Webmaster Tools" example is due to me not being the webmaster of owasp.org. This can be resolved in v4 once the webmaster is known.


 * For the first part, either that sentence wasn't part of the content I reviewed during v3 draft or it didn't seem significant enough given the lead-in.


 * As for the webmaster tools stuff that sounds good, however like INFO-001 seems awfully google-centric.


 * CH - Bing/Yahoo! have been included in the roadmap for v4.


 * Perfect Rick.mitchell (talk) 19:20, 15 August 2013 (CDT)


 * This content also only covers robots.txt though the heading suggests much broader coverage. So either the content should be expanded or the heading made more specific (IMHO). Rick.mitchell (talk) 15:07, 15 August 2013 (CDT)


 * CH - I included http://www.robotstxt.org/meta.html in the OWASP Testing Guide v3 (since I also presented on this content in 2009 and 2010). I am not sure if it was removed by subsequent edits by others (I haven't checked this) but I will include it for v4 again.


 * Sounds good. Though I'd still argue that this covers an alternative to robots.txt not any actually different or other mechanisms as implied by "web server metafiles" in the heading; which to me reads like there are other configuration, instruction or interaction governors to be discussed in this section. Just my 2 cents Rick.mitchell (talk) 19:20, 15 August 2013 (CDT)

TODO for v4
1. Insert the "Analyze robots.txt using Google Webmaster Tools" i.e. https://support.google.com/webmasters/answer/156449?hl=en&from=35237&rd=1 with owasp.org (not applicable, since webroot doesn't contain robots.txt) as the example.

2. May need to update the reference to OWASP-IG-009 within the "Summary" section depending on the finalisation of the spidering thread (To be created).

3. Add Microsoft/Yahoo! related content. - DONE

TODO for v5
http://blog.erratasec.com/2014/05/no-mcafee-didnt-violate-ethics-scraping.html http://blog.osvdb.org/2014/05/07/the-scraping-problem-and-ethics/ https://github.com/behindthefirewalls/Parsero