May 1, 2007

How to use robots.txt

What is the robots.txt?
Robots.txt is a simple text file you can write with any text editor, that tells search engines and other bots where not to go on your website. This can be i.e pages with program files that have no value for search engines indexing your site, pages that are still under construction or your log files.

A recently implemented use of robots.txt file is also to tell search engines where you store your sitemap file by adding a line like this:
Sitemap: http://www.mydomain.com/sitemap.xml

Unfortunately it is also a good source for hackers to gain information about files and folders on your web server. Robots.txt is in not to keep files hidden. For this you would need to use a .htaccess file on your server or other techniques.

What does a robots.txt look like?
These three lines would keep out all robots that obey the rules out of the temp folder:
# Exclude all robots from temp folder
User-agent: *
Disallow: /temp/
Line 1 is a comment line, # starts a comment. This line is optional.
Line 2 lists the robots for which the next line’s command is valid: A wildcard (*) here means all robots.
Line 3 specifies the file or folder you want to exclude. If you just write Disallow: , you allow everything. Be careful here. Don't forget to add a file or folder name.

There is a command 'Allow' exists but it is not supported by all robots, so better play it safe and use 'Disallow' even if it is more work.

For a folder you write
Disallow: /foldername/

for a file use
Disallow: /filename.html

Only one folder or file per line is allowed. If you want to exclude several folders, use one line per folder/file.

If you only want to exclude a certain robot, use it’s name instead of the wild card (*):
User-agent: Robotname
Disallow: /temp/

Where to place the robots.txt?
The robots.txt has to be in the same folder as the entry page of your domain. I.e. http://www.mydomain.com/robots.txt

After you created the robots.txt on your computer with notepad or a similar tool (do not use a HTML editor), name it robots.txt (lower case!) and transfer it with your favourite FTP program using the ASCII transfer option.

More information about robots.txt

Share and Enjoy:These icons link to social bookmarking sites where readers can share and discover new web pages.
  • blinkbits
  • BlinkList
  • blogmarks
  • co.mments
  • del.icio.us
  • De.lirio.us
  • digg
  • Fark
  • feedmelinks
  • Furl
  • LinkaGoGo
  • Ma.gnolia
  • NewsVine
  • RawSugar
  • Reddit
  • scuttle
  • Shadows
  • Simpy
  • Spurl
  • TailRank
  • Wists
  • YahooMyWeb
Permalink • Print

Trackback uri

http://www.guiding-stars.com/webmaster-help/how-to-use-robotstxt/32/trackback/

Made with WordPress and a healthy dose of Semiologic • Light Gold skin by Denis de Bernardy