What Is Robots.txt?

January 17, 2010  
For a search engine to keep their listings up to date, and present the most accurate search engine results, they perform an action known as a ‘crawl’. This is essentially sending a ‘bot’ (sometimes known as a ‘spider’) out to crawl the internet. The bot will then find new pages, updated pages or pages it did not previously know to exist. The end result of the crawl is that the search engine results page is updated, and all of the pages found on the last crawl are now included. It’s simply a method of finding sites on the internet.

However, there may be some instances where you have a website page you do not want included in search engine results. For example, you may be in the process of building a page, and do not want it listed in search engine results until it is completed. In these instances, you need to use a file known as robots.txt to tell a search engine bot to ignore your chosen pages within your website.

Robots.txt is basically a way of telling a search engine “don’t come in here, please”. When a bot finds a robots.txt file, it will ‘read’ it and will usually ignore all the URLs contained within. Therefore pages within the file do not appear in search results. It isn’t a failsafe; robots.txt is a request for bots to ignore the page, rather than a complete block, but most bots will obey the information found within the file. Some “nasty” bots may actually ignore your robots.txt file and index everything they find. However, for the nice bots, when you are ready for the page to be included in a search engine, you simply modify your robots.txt file and remove the URL of the designated page.