Tuesday, January 11, 2011

A default robots.txt files for all sites

I recently registered an alternate domain, wcstat.com, in order to serve the static content for weathercurrents.com from a cookie-less domain, and to improve page load times.

I moved all my static content (images, videos, CSS files, JavaScript files) to this domain and configured it in Apache, then began the process of pointing . (I also left the static content in its old places (http://static.weathercurrents.com and http://weathercurrents.com/static/) to avoid breaking any pages. Once  the move is complete, I'll follow up with a couple of mod_rewrite rules.)

Looks like I forgot the robots.txt file. Here's an excerpt from this morning's error logs:


[Tue Jan 11 01:52:48 2011] [error] [client 124.115.6.10] File does not exist: /webdirectory/static/robots.txt

Drat.

Fortunately, the solution is an easy one, solved with a default robots.txt file in the root directory for wcstat.com:

User-agent: *
Disallow:

The lesson is that everyone should have one of these with at least this in it for every unique domain they own/serve. Another best practice is to include a Sitemap: entry in it too, at the very top. I would argue that this site, since it's serving only static content referenced from dynamic pages elsewhere, does not need one of those.


I should also add that another common 404 is on favicon.ico icons. Every web site should have one of those, too.

No comments:

Post a Comment