The proper use of robots txt is critical

recently friends often ask me some robots, today I will use some skills and experiences of robots and share with everyone, welcome Paizhuan.

robots properties: text files for TXT.

The application of

principle: when a search robot (sometimes called search spiders) access to a site, it will first check whether robots.txt exists, the site root directory if it exists, the robot will search range according to the contents of the file to determine access; if the file does not exist, then the search robot along the link grab.

method: User-Agent: applies the following rules of the browser Disallow:, to intercept the web page, here are some commonly used robots file writing, for your reference,

robots: 1. web site managers can declare the site do not want to be part of the search engine spiders visit, or specify the search engine included only the specified content, can effectively protect the site management information 2. according to Occam’s razor (http://s. to avoid duplication of pages in search results page and some dynamic link generation by grasping and affect the overall quality of the site. In the 3.robots.txt file, you can also direct the location of the sitemap sitemap file to the search engine. 4. prevent spider from coming to the website to view the robots file and generate 404 error messages.

introduces some of robots’s techniques and specific operations below;

1. limits the range of files that search spiders crawl:

allows all search spiders to access

User-agent: *


prohibits all search engines from accessing any part of the web site,

User-agent: *

Disallow: /

prohibits all search engines from accessing the site’s management, landing backstage, in case it leaks information about the site

User-agent: *

Disallow: /admin/

prohibits access to a search engine (that is, the Taobao shield Baidu mentioned below)

User-agent: Baiduspider

Disallow: /

only allows access to a search engine (Google in the following example)


Leave a Reply

Your email address will not be published. Required fields are marked *