Google's XML Sitemaps


Google's XML Sitemaps are often recommended by SEOs as part of search engine optimization proposals. I'm not convinced that XML sitemaps have any benefit for webmasters though.

I don't think that Google obeys sitemaps much—if at all. Links and cache (Google's memory of pages) direct Googlebot — sitemaps just tell Google about discrepancies between what "should" be there and where the crawler is actually going.

Sitemaps have some uses — for example, getting a piece of the debugging data that Google is collecting. If you have pages in your sitemap that Google is getting a 404 header on, the control panel will tell you.

I suspect that Google's engineers created Googlebot because they want to know "what is the difference in the way webmasters want us to see their sites and the way Googlebot sees the sites?" Then the engineers can go in and look at specific cases in order to work out bugs in Googlebot and to create "site profiles" to deal with spam issues.

In order to convince webmasters to upload XML sitemaps, Google hints to webmasters that sitemaps are beneficial for sites. The Webmaster Control Panel is the carrot on a stick. There is evidence that Google is also building long-term profiles on webmasters in order to control spam and click fraud. Google's growing arsenal of webmaster tools makes it easier for them to keep track of site owners.

My current opinion of XML sitemaps is that they are just an unnecessary headache unless you have a script to generate them automatically when content is updated. I'm not convinced that there is any significant benefit for webmasters to have an XML sitemap on a site.

An interesting experiment would be to create a site and put pages on the site that have no inbound links, but that are listed in the XML sitemap. Will the crawlers fetch them, and how often?

Further Reading on XML Sitemaps

Syndicate content