############################################################################### # This (commented) information is for starting webdesigners: # # I would have found it useful, so I am including it for others. # ############################################################################### # # # robots.txt - This file tells scanning robots # # where they are and are not welcome. # # # # Quickly, how it works: # # User-agent: "*" means any (other) User-agent, # # or specify a robot by name) # # Disallow: if this matches first part of requested path, # # then don't go there. # # # ############################################################################### # # # This is the official source for information on the robots.txt file: # # The Web Robots Pages - http://www.robotstxt.org/wc/robots.html # # # # The specification for robots.txt is here: # # http://www.robotstxt.org/wc/norobots.html # # # ############################################################################### # # # The following information is # # from: http://www.robotstxt.org/wc/exclusion-admin.html # # # # To exclude all robots from the entire server # # User-agent: * # # Disallow: / # # # # To allow all robots complete access # # User-agent: * # # Disallow: # # # # To exclude all robots from part of the server # # User-agent: * # # Disallow: /cgi-bin/ # # Disallow: /tmp/ # # Disallow: /private/ # # # # To exclude a single robot # # User-agent: BadBot # # Disallow: / # # # # To allow a single robot # # User-agent: WebCrawler # # Disallow: # # # # User-agent: * # # Disallow: / # # # # # # To exclude all files except one # # This is currently a bit awkward, as there is no "Allow" field. The easy # # way is to put all files to be disallowed into a separate directory, say # # "docs", and leave the one file in the level above this directory: # # User-agent: * # # Disallow: /~zac/docs/ # # # # Alternatively you can explicitly disallow all disallowed pages: # # User-agent: * # # Disallow: /~zac/private.html # # Disallow: /~zac/foo.html # # Disallow: /~zac/bar.html # ############################################################################### ############################################################################### # Here is the contents of robots.txt for this server: # ############################################################################### User-agent: * # all robots are to use the same rules at this site! # files to ignore Disallow: /archive.html Disallow: /freeicons.htm Disallow: /freeicons.htm Disallow: /index.html Disallow: /knowyourights.htm Disallow: /knowyourights.html Disallow: /knowyourrights.htm Disallow: /knowyourrights.html Disallow: /.htaccess # directories to ignore Disallow: /amber/ Disallow: /cgi-bin/ Disallow: /css/ Disallow: /doc/ Disallow: /ebay/ Disallow: /ecmascript/ Disallow: /error/ Disallow: /fresh/ Disallow: /freshasarose/ Disallow: /gifs/ Disallow: /guestbook/ Disallow: /html/ Disallow: /images/ Disallow: /includes/ Disallow: /india/ Disallow: /k/ Disallow: /knowyourrights/ Disallow: /mp3/ Disallow: /navigation/ Disallow: /pdf/ Disallow: /pictures/ Disallow: /resume/ Disallow: /sorry/ Disallow: /success/ Disallow: /sadie/ Disallow: /search/ Disallow: /search-results/ Disallow: /ssi/ Disallow: /temp/ Disallow: /test/ Disallow: /text/ Disallow: /wav/ Disallow: /wwwstat/ Disallow: /zip/