Search system robots and how they actually work

Records (records) a file /robots.txt

The expanded comments of a format

Each record begins since a line User-Agent in which it is described what or to what retrieval robot this record intends. The next line: Disallow. Here not subject indexings of a way and files are described. EACH record SHOULD have at least these two lines (lines). All remaining lines are options. Record can contain any amount of lines of comments. Each line of the comment should begin with # character. Lines of comments can be placed in the end of lines User-Agent and Disallow. # character in the end of these lines is sometimes added to specify to the retrieval robot that the long line agent_id or path_root is finished. If in line User-Agent it is specified a little agent_id the condition path_root in line Disallow will be fulfilled for all equally. Restrictions on length of lines User-Agent and Disallow are not present. If the retrieval robot has not found out in a file /robots.txt the agent_id it ignores /robots.txt.

If not to consider specificity of operation of each retrieval robot is possible to specify exceptions for all robots at once. It is reached by the job of a line User-Agent: *

If the search robot finds out in a file/robots.txt some records with value satisfying it agent_id the robot is free to choose any of them.

Each search robot will define absolute URL for reading from a server with use of records/robots.txt. Header and lower case symbols in path_root MATTER.

Examples:

Example 1:

User-Agent: *

Disallow: /

User-Agent: Lycos

Disallow:/cgi-bin//tmp/

1 file/robots.txt contains in an example two records. The first concerns all search robots and forbids indexing all files. The second concerns search robot Lycos and at indexing of a server by it forbids directories/cgi-bin/and/tmp/, and the others – resolves. Thus the server will be indexed only by system Lycos.

Example 2:

User-Agent: Copernicus Fred

Disallow:

User-Agent: * Rex

Disallow:/t

2 file/robots.txt contains in an example of two records. The first resolves search robots Copernicus and Fred to index the entire server. The second – forbids all and especially robot Rex to index such directories and files, as/tmp/,/tea-time/,/top-cat.txt,/traverse.this etc. It is just a case of the task of a mask for directories and files.

Example 3:

# This is for every spider!

User-Agent: *

# stay away from this

Disallow:/spiders/not/here/#and everything in it

Disallow: # a little nothing

Disallow: #This could be habit forming!

# Don “t comments make code much more readable!!!

In an example 3 is one record. Here all robots are forbidden to index a directory/spiders/not/here/, including such ways and files as/spiders/not/here/really/,/spiders/not/here/yes/even/me.html. However here do not enter/spiders/not/or/spiders/not/her (into directories “/spiders/not /”).

Today the web technologies are very popular. The Internet network is not only a place to entertain but also a platform to make money. In spite of the reason, to be presented in the Internet one needs a site. And it is the time when the question how to make a website arises. Those who are looking for info on how to build a website, are advised to refer to the Internet itself. It is full of docs on how to make a website and respective topics.

In any way, it wouldn’t be smart not to use this opportunity given to us by modern technologies. Google and other search engines, social networks and forums, blogs – all of them could assist to find info on “make a new website” and similar topics.

Tagged with:

Filed under: Uncategorized

Like this post? Subscribe to my RSS feed and get loads more!