############################################################################### # This (commented) information is for starting webdesigners: # # I would have found it useful, so I am including it for others. # ############################################################################### # # # robots.txt - [this file] Tells scanning robots # # where they are and are not welcome. # # # # Quickly, how it works: # # * = all wildcard # # / = root of server (start of path name) # # User-agent: = specify a specific robot name or "*" for all # # Disallow: = set this to the first part of the path # # (starting with "/") you don't want # # to allow robots to visit # # # ############################################################################### # # # This is the official source for information on the robots.txt file: # # The Web Robots Pages - http://www.robotstxt.org/wc/robots.html # # # # The specification for robots.txt is here: # # http://www.robotstxt.org/wc/norobots.html # # # ############################################################################### # # # The following information is # # from: http://www.robotstxt.org/wc/exclusion-admin.html # # # # To exclude all robots from the entire server # # User-agent: * # # Disallow: / # # # # To allow all robots complete access # # User-agent: * # # Disallow: # # # # To exclude all robots from part of the server # # User-agent: * # # Disallow: /cgi-bin/ # # Disallow: /tmp/ # # Disallow: /private/ # # # # To exclude a single robot # # User-agent: BadBot # # Disallow: / # # # # To allow a single robot # # User-agent: WebCrawler # # Disallow: # # # # User-agent: * # # Disallow: / # # # # # # To exclude all files except one # # This is currently a bit awkward, as there is no "Allow" field. The easy # # way is to put all files to be disallowed into a separate directory, say # # "docs", and leave the one file in the level above this directory: # # User-agent: * # # Disallow: /~ppcwd/docs/ # # # # Alternatively you can explicitly disallow each page individually: # # User-agent: * # # Disallow: /~ppcwd/private.html # # Disallow: /~ppcwd/foo.html # # Disallow: /~ppcwd/bar.html # ############################################################################### ############################################################################### # Here is the contents of robots.txt for this server: # ############################################################################### User-agent: * # all robots are to use the same rules at this site! # files to ignore Disallow: /.htaccess # directories to ignore Disallow: /cgi-bin/ Disallow: /css/ Disallow: /doc/ Disallow: /ecmascript/ Disallow: /error/ Disallow: /forbidden/ Disallow: /images/ Disallow: /includes/ Disallow: /pdf/ Disallow: /sorry/ Disallow: /success/ Disallow: /temp/