This article at all is not attempt to explain how search machines in general work. However, in my opinion, she will help to understand as it is possible to operate with behavior of search robots (wanderers, spiders, robots – programs with which help this or that search system rummages around a network and indexes meeting documents) and how correctly to construct structure of a server and documents containing on it that your server was easily and well indexed.

The first reason of that I have dared to write this article, the case when I investigated a file of dens of access to my server was and has found on: lycosidae.lycos.com

That is Lycos has addressed to my server, on the first inquiry has received that the file/robots.txt is not present, and has sniffed at the first page. Naturally, it was not pleasant to me, and I have started to find out that to what.

It appears all “clever” search machines at first address to this file which should be present on each server. This file describes access rights for search robots, and there is a possibility to specify for various robots the different rights. For it there is a standard under name Standard for Robot Exclusion.

According to Louis Monier, Altavista, only 5 % of all sites have now not empty files/robots.txt if at all these files there exist. It proves to be true the information collected at recent research of dens of work of robot Lycos. Charles P.Kollar, Lycos writes that only 6 % from all inquiries about a subject/robots.txt have a code of result 200. Here some reasons on which it occurs:

* people who establish the Web server, simply do not know neither about this standard, nor about necessity of existence of a file/robots.txt.

* not necessarily person installing the Web server, is engaged in its filling, and the one who is the web designer, has no due contact to the manager.

* this number reflects number of sites which really require an exception of superfluous inquiries of robots as not on all servers there is such essential traffic at which server visiting by the search robot, becomes appreciable for simple users.

File format /robots.txt:

The file /robots.txt is intended for instructions to all retrieval robots (spiders) to index information servers how is defined in this file, i.e. only those directories and files of the server which are not described in /robots.txt. It is a file should contain 0 or more records which are connected to that or other robot (that is defined by field value agent_id), and specify for each robot or for all at once what exactly it SHOULD NOT be indexed.

The one who writes a file /robots.txt, should specify substring Product Token of field User-Agent which each robot produces on HTTP-request of the indexable server. For example, the present robot of Lycos on such request produces as field User-Agent: Lycos_Spider _ (Rex)/1.0 libwww/3.1.

If the robot of Lycos has not found the description in /robots.txt – it arrives how considers it necessary. As soon as the Lycos robot “has seen” in a file /robots.txt the description for itself – it arrives how it is offered to it.

At file creation /robots.txt it is necessary to consider one more factor it is file size. As each file which should not be indexed, moreover for many types of robots separately is described, at a considerable quantity of files not subject to indexing the size /robots.txt becomes too big. In this case it is necessary to apply one or several following methods of abbreviation of the size /robots.txt:

* to specify a directory which should not be indexed, and, accordingly, files not subject to indexing to allocate in it.

* to create structure of the server taking into account simplification of the description of exceptions in /robots.txt.

* to specify one method of indexing for all agent_id.

* to specify masks for directories and files.

These days the Internet technologies are very popular. The Internet network is not only a place to entertain but also a platform to make money. In spite of the reason, to be presented in the Internet one needs a site. And this is when the question how to make a website arises. Those who are looking for info on how to build a website, should refer to the Web itself. There are lots of tutorials on how to make a website and related topics.

In any way, it wouldn’t be smart not to avail themselves of this opportunity provided to us by modern technologies. Google and other search engines, social networks and forums, blogs – all of them could assist to find info on “where to make a website” and similar topics.

Tagged with:

Filed under: Uncategorized

Like this post? Subscribe to my RSS feed and get loads more!