A better robots.txt for blogs

Dec 21 2011

If you are using a stock WordPress installation, take a look at your robots.txt file and it will most probably say something like:

User-agent: *
Disallow: /wp-admin
Disallow: /wp-includes
view raw gistfile1.txt This Gist brought to you by GitHub.

The problem with this is that there are many bots which will hit you very hard if you don’t specifically block them. Here is a better robots.txt file:

User-agent: *

Disallow: /wp-content/
Disallow: /wp-icludes/
Disallow: /trackback/
Disallow: /wp-admin/
Disallow: /archives/
Disallow: /category/
Disallow: /tag/*
Disallow: /tag/
Disallow: /wp-*
Disallow: /login/
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.php$

User-agent: All
Allow: /

User-agent: Googlebot-Image
Disallow: /

User-agent: ia_archiver
Disallow: /

User-agent: duggmirror
Disallow: /

User-agent: Yandex
Disallow: /

User-agent: moget
User-agent: ichiro
Disallow: /

User-agent: NaverBot
User-agent: Yeti
Disallow: /

User-agent: Baiduspider
User-agent: Baiduspider-video
User-agent: Baiduspider-image
Disallow: /

User-agent: sogou spider
Disallow: /

User-Agent: Slurp
Disallow: /
view raw gistfile1.txt This Gist brought to you by GitHub.

Feel free to fork and create better versions.

Just keep in mind that there are still a lot of other bad/useless bots out there which will consume your resources, but many of them do not respect robots.txt. You will have to block their IPs in your .htaccess (or other similar conf files depending on your server).

One response so far

  • Worli says:

    I believe the new WordPress has already set now follow tag on some of its component, like wp-admin and wp-login. There is also a WordPress plugin which create robots.txt file blocking unwanted bots.

Leave a Reply