![]() |
|
|
Introducing Web Robots and Robots.txt
Search engines such as Google and AltaVista use Web spiders, also known as robots or crawlers or agents or wanderers or worms, to create the indexes for their search databases. These robots analyze HTML trees by loading pages and following hyperlinks, and they report the text and/or Meta tag information to create search indexes.
Robots.txt is a file that spiders look in for information on how the site is to be cataloged. It is a ASCII text file that sits in the document root of the server. It defines the documents and/or directories spiders are prohibited to index.
The robot's activity gives your site better visibility and raises traffic. However, your site might contain:
Naturally, you may wish to restrict the access to these resources. You also shouldn't forget that:
The robot exclusion protocol was introduced by Martijn Koster in 1994 to deal with problems that had been arising due to the increasing popularity of the Internet and the toll Web spiders were having on system resources. Some of the problems were caused by robots rapid-firing requests (loading pages in rapid succession). Among other problems were the following: robots indexing information deep in directory trees, temporary information, and even accessing CGI-scripts. The robot exclusion protocol was quickly adopted by webmasters and web robot makers as a way to organize and control the indexing process.
Since then, the size of the Internet has increased dramatically and millions of people are now using it. The number of Web robots crawling the Web is much bigger than before and it is more important than ever for all Web sites to have a properly created and maintained Robots.txt file.
With Robots.txt Editor you create robot exclusion files by selecting all robots or a specific user-agent and adding documents and/or directories by entering the path names manually or by selecting them using FTP. Once all the restrictions and directives are set you can save the Robots.txt file to your hard drive or upload it directly to your server.
With the help of the program you can get the full statistics of robots visiting the pages of your Web site.
Controlling Spiders with Robots.txt Files
How can you take control over robot intruders and make them useful for your site?
Before indexing your site, the spider downloads Robots.txt file that contains instructions on what should and what should not be indexed. Therefore a key to controlling spiders is Robots.txt file. If you have a large Web site or update it frequently, creating and editing it will be a hard and dull work.
Robots.txt Editor is an easy-to-navigate visual editor that will enable you to specify different directives for selected spiders in specific areas of the site and generate the Robots.txt file quickly and easily. You will not have to waste your time creating Robots.txt files by hand and wondering if it is formatted correctly! Used along with other Web site promotion software of Net Promoter, Robots.txt Editor enhances the efficiency and effectiveness of your optimization strategy.
Robots.txt Editor along with FTP Uploader and a number of other useful features, make up a powerful and handy tool for controlling spiders on your Web site.
Analyzing Spiders' Visits To Your Web Site
All spider visits are recorded in a log file stored at your server. Again, you may review all log files manually and compose your database.
But why dig yourself into this dull and time-consuming task? You may take advantage of our Log Analyzer module that allows you to download and save log files with information on spider visits, and to export it into different file formats. Besides, the information will be structured. You'll be able to track the visits of spiders by time period or by specific pages, and to generate reports by various criteria.
This information is extremely useful when promoting your Web site for major search engines around the world. You know exactly when the spider or robot visits, and what pages on your Web site it indexes. You don't have to guess and try to decipher if a new page was indexed by a search engine spider or not. Your log file contains all this information.
Spider Log Analyzer





Reviews & Questions
