The robots.txt file is a file that goes in the root of your Web site and controls which pages the search engine spiders are allowed to look at.
A robots.txt file is basically just a plain text file with a list of all the files that you want to stop the spiders from looking at. The pages you will want to exclude are things like thank-you pages as you wouldn't really want those to be listed in the search engines.
Usually webmasters create a robots.txt file by hand, but XSitePro lets you create one automatically and have it update automatically. Once you've done the initial work you can forget all about your robots.txt file and leave all the work to XSitePro.
By default XSitePro creates a robots.txt file and uploads it to your site even though the file may not contain anything. The reason for this is that most search engine spiders will look for this file and if your site doesn't have one a ‘404 Page Not Found error’ would be registered in your Web site’s log. Just having a robots.txt file present, even if it is empty, prevents this from happening.
Setting Up Your Robots
To set up the robots.txt for a Web site you need to go to the Other tab and click on the Robots button and the Robots window will appear.
The options available, in this screen, are as follows:
Create Robots.txt File Check-box – Leave this setting checked if you want to include a robots.txt file (this setting will be checked by default). If you uncheck it then XSitePro will not create the robots file and will not upload a robots.txt file to your web host.
If checked, there are three options you can specify, for each of your Web pages:
Follow: Tell the search engine spider to follow the links on this page.
Index: Allow this page to be included in the search engine's listing.
Disallow: Tell the spider to ignore the page completely and not to index it.
You can have different settings for each page on your site. For example, you might have some pages that you do not want in the search results (e.g. a thank you page) and there might be other pages that you do not want to be in the search results of a search engine, but you want all the links on that page to be followed regardless (some people might choose to do this with a site map).
At the bottom of the list are some buttons that will allow you to quickly change the settings for every page to that setting. For example, if you want every page to be set to Disallow you'd click on the Disallow button at the very bottom of the list.
Note: You can edit the Robot.txt settings for an individual Web page from within the Advanced Page Settings for that page. This is covered in detail within the Advanced Page Settings section but, essentially, changes made in the Advanced Page Settings will update the same settings as presented in this Robots screen (albeit it for one page at a time, rather than in a full list as in the robots tool).