I've decided to organize a section of this Web site around posts related to Drupal SEO. I'll be adding a couple of other tutorials in the next week or two. In the meantime, checkout the previous Drupal search engine optimization articles below.
If you are completely new to search engine optimization, read and bookmark the intro to SEO page, and check out SEO Elite Software.
I also offer Drupal SEO consulting services.
I offer two Drupal-related SEO services:
I've written some of the most comprehensive Drupal SEO tutorials on the Web, including the Drupalzilla robots.txt tutorial and the basic Drupal SEO tutorial that is at the top of Google for drupal seo right after Drupal.org.
An SEO Site Audit is a detailed analysis of your site's configuration and structure, and it contains recommendations on optimizing your SEO with Drupal-specific tips.
SEO site audits are delivered in PDF format, with 2 hour of consulting beyond the delivery of the site audit. A typical site audit is between 35 to 50 pages in length. SEO site audits are done at a flat rate of $2000 USD.
An SEO Campaign is a longer consulting agreement of 6 or more months where I work with your Web developers to systematically increase traffic through comprehensive search engine and social media optimization techniques.
For more information, please inquire through the form below:

NEW! This Drupal SEO tutorial has been updated and rewritten in May 2008.
Drupal is a great open source GPL content management system. With a few modifications it can be configured for excellent on-site search engine optimization. This tutorial only covers the very basics of on-site optimization. It will make sure that search engines are able to spider your site, and prevent some common Drupal SEO errors.
This is just a basic introduction to configuring a Drupal site for good search engine rankings. Other tutorials will go into more depth.
Search engines prefer clean URLs. In Drupal 6, clean URLs should be automatically enabled if your server allows it. In Drupal 5 you can enable clean URLs under administer —> settings —> Clean URLs. Clean URLs are necessary for the pathauto module, mentioned below.
The pathauto module is highly recommended. Pathauto will automatically make nice customized URLs based on things like title, taxonomy, content type, and username. You also have to enable the path module for pathauto to work.
Think carefully about how you want your URLs to look. It takes some experience with Drupal to get the exact URL paths that you might want. The URLs are controlled by a combination of taxonomy and pathauto, and I hope to cover that in another tutorial. You can also use the path module to write custom URLs for each page, but that might become tedious and inconsistent on a large site.
At the very least, enable the path module and install the pathauto module. It will generate nice-looking URLs for you without much configuration.
Caution: The above advice is directed towards new Drupal sites. If you have an existing Drupal site be very careful that you don't rename your previously existing URLs with the pathauto module. It is generally a very bad idea to change existing URLs because the search engines will no longer be able to find those pages.
Here are some pathauto settings to watch out for:
For update action choose "Do nothing. Leave the old alias intact." Otherwise the URLs of nodes will change every time you change the title of your post, causing problems with search engines:

There is also a more comprehensive Pathauto tutorial.
The Global Redirect Module will automatically do 301 redirects to your URL aliases. So if you have a node a example.com/node/5, the Global Redirect Module will redirect that URL to your alias at example.com/my-page.
Read more about the Global Redirect Module.
The Meta Tags Module (formerly called "Nodewords Module") can be highly beneficial to your site. There is a myth in some search engine optimization circles that says, "meta tags are not important". This is not true.
Meta tags are not meant to be used for keyword stuffing. Don't use them for that purpose because it isn't going to help you. The really important meta tag is the meta description.
The meta description should be different on every page for best results. The meta description should be one or two brief sentences to summarize the page. It should be written for your human visitors, but it is not a bad idea to tastefully and sparingly insert a couple of your keywords. Often when a search engine lists your site in the search engine results pages, it will use your page's HTML title for the title, and your meta description for the text snippet. That is why the meta description should be written with human visitors in mind. You want a text snippet that is going to make them want to click on the link.
Here is one textbook example from this site in the Google SERPs:

I generally configure the Drupal Nodewords module to output the meta description and meta keywords on every page. I have a few default keywords set, and add a couple more on every post to make a unique combination of relevant keywords. I don't spend much time with it because I don't think the meta keywords are that important.
On the nodewords module's administration page, be sure to check the box that says "Use the teaser of the page if the meta description is not set?". That way each page will get a unique meta description even if you have denied access to create custom meta tags for nodes to some users.
The Page Title Module allows you to set custom page titles on every page. Highly recommended.
Google Sitemaps are not essential, but I've been adding them to my Drupal sites. I think that Google Sitemaps were created by Google primarily for debugging Googlebot and not for the benefit of search engine optimizers.
There is a Drupal Sitemap Module, but the last time I checked it had serious bugs that made it unusable. In any case, I don't think that most Web sites need XML sitemaps. Other SEOs have similar opinions about sitemaps.
I recommend not using the Drupal Sitemaps Module.
Make sure that your site does a permanent (301) redirect in either of the following two ways:
You can setup this redirect in your .htaccess file.
To remove the www from your site, look for the following code in your .htaccess file and uncomment and adapt:
# To redirect all users to access the site WITHOUT the 'www.' prefix,
# (http://www.example.com/... will be redirected to http://example.com/...)
# uncomment and adapt the following:
# RewriteCond %{HTTP_HOST} ^www\.example\.com$ [NC]
# RewriteRule ^(.*)$ http://example.com/$1 [L,R=301]
To redirect to the www version of the site, look for the following code and uncomment and adapt:
# To redirect all users to access the site WITH the 'www.' prefix,
# (http://example.com/... will be redirected to http://www.example.com/...)
# adapt and uncomment the following:
# RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
# RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]
Be sure to replace example.com with your domain name, and then test the redirects in a browser.
There should be one <h1> header element on every page and it should have your keywords in it.
By default, the front page of a Drupal site has nearly identical content to the page at /node. Search engines are going to spider and index /node because on the paginated home page view, the link to the first page in the series points at /node.
The fix for this is simple — always use a custom front page when building a Drupal site.
I haven't seen this problem on Drupal sites in a long time, but if you see PHP session IDs in your URLs, it is very bad for search engines. They have to be removed if you want search engines to be able to spider your site well. A PHP session ID in your URL might look something like this: ?PHPSESSID=37765439acbd6c12345ee987776e65be.
From what I understand, this is the fix if your server supports mod_php — it goes in your .htaccess file:
# Fix PHP session ID problems in Drupal
php_value session.use_trans_sid 0
php_value session.use_only_cookies 1Otherwise you can probably fix it my modifying your php.ini file (or creating one). I don't know the exact procedure for every host, only that your web site must not have PHP session IDs in the URLs if you want good spidering by search engines. Search Drupal.org or Google for how to turn off PHP session IDs on your server.
The default Drupal robots.txt file has critical errors in it even in Drupal 6.2 (bug report already filed).
Read this Drupal robots.txt tutorial for more information.
Watch out for contributed modules that create duplicate content through extra URLs. This can be a serious problem.
To learn more about search engine optimization, check out the SEO resources page.
Google Engineer Matt Cutts talks about canonical home page URLs on his blog. The concept is basically this:
For the most part, search engines view different URLs as being entirely different pages. So the following URLs all may show the same content, but search engines will often see them a different pages with duplicate content:
Drupal does not link to its index.php file so the third URL example is generally not an issue with Drupal. However you should choose between using the www version of the domain name or the non-www version of the domain name. Drupal makes this easy by providing instructions in the default .htaccess file as shown below:
# adapt and uncomment the following:
# RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
# RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]
#
# To redirect all users to access the site WITHOUT the 'www.' prefix,
# (http://www.example.com/... will be redirected to http://example.com/...)
# adapt and uncomment the following:
# RewriteCond %{HTTP_HOST} ^www\.example\.com$ [NC]
# RewriteRule ^(.*)$ http://example.com/$1 [L,R=301]
When setting up your Drupal site you should decide whether you want your site to have the www subdomain or not and choose one of the two options in the .htaccess file.
For SEO purposes it doesn't matter either way unless your site has already been live for a while. If Google has indexed your site and shows the PageRank value of the home page (as seen in the Google Toolbar or through a Firefox Extension like Search Status), then Google has already chosen one version or the other for your domain name. In that case I would redirect to the version of the domain that Google has already accepted. You can determine which version Google has chosen by typing your domain name into Google without the www like this: example.com
Google should show your domain name at the top of the SERPs. If Google shows your home page with the www then you should redirect your site to the www-version. If they leave off the www then redirect to the version without the www.
Some people would say that it doesn't matter which one you redirect to even if Google has already indexed the site. But sometimes when you 301 redirect pages or sites, Google drops the original URL and it takes a while to get it ranked again. That is why I recommend going with the choice that Google has already made for you.
The old Drupalzilla.com site had a database of Drupal modules with tips for SEO. I've copied some of the information into the pages below.
The Abuse Module allows users to flag content as spam. It outputs an extra link at the bottom of teasers and full nodes.

The image above shows the link that is created at the bottom of nodes and comments that allows users to flag content for review by the moderators. The URLs that are linked-to have the structure http://example.com/abuse/report/comment/347. If you have a node with 15 comments, the Abuse Module will create 16 extra pages on the site, 15 abuse form pages for the comments and one for the node.
To fix this problem, the following rule should be added to your robots.txt file when using the Abuse Module:
Disallow: /abuse/
The Forum Module is part of Drupal core. If you enable the Forum Module, there are some things to be aware of.
Whenever Web sites create tables that are sortable by column headers, you are looking at potential duplicate content.

The image above shows table headers in Drupal's Forum Module. When you click on one of those links, it re-sorts the data in the table appending parameters to the original URL.
In the example image above, the original URL structure is http://example.com/forums/introduce-yourself. Drupal's Forum Module creates the following additional URLs in the header links:
| Link Text | URL |
|---|---|
| Title | http://example.com/forums/introduce-yourself?sort=asc&order=Topic |
| Replies | http://example.com/forums/introduce-yourself?sort=asc&order=Replies |
| Created | http://example.com/forums/introduce-yourself?sort=asc&order=Created |
| Last Reply | http://example.com/forums/introduce-yourself?sort=desc&order=Last+reply |
After visiting those pages you (and spiders) will also find the following URLs:
| Link Text | URL |
|---|---|
| Title | http://example.com/forums/introduce-yourself?sort=desc&order=Topic |
| Replies | http://example.com/forums/introduce-yourself?sort=desc&order=Replies |
| Created | http://example.com/forums/introduce-yourself?sort=desc&order=Created |
| Last Reply | http://example.com/forums/introduce-yourself?sort=asc&order=Last+reply |
Pagination of the forums makes it even worse because each page can then be sorted in these 8 ways. Here is one example from Drupal.org: http://drupal.org/forum/2?sort=asc&order=Last+reply&page=393.
The recommended fix for this problem is to add the following line to the robots.txt file:
Disallow: /*sort=
The following line should be added to the default Drupal robots.txt file because the Forum Module is distributed with Drupal:
Disallow: /*sort=
The Forward Module adds a link to each teaser and full node that allows users to email the node to people.

The image above shows the link that is created on every node. The URL structure of the link is http://example.com/forward/343. If your site has 3000 nodes, the Forward Module will create 3000 extra pages with nothing but a form that allows people to email the nodes to their friends.
To fix the issue, add the following line to your robots.txt file:
Disallow: /forward/
The Global Redirect module has three main features:
If you search around the Web for Drupal SEO tutorials, many people recommend using mod_rewrite rules in an .htaccess file to deal with issues like removing trailing slashes. But on sites that also have non-Drupal content, you may have URLs that do have trailing slashes.
A slash is the symbol for a directory. For example, in the URL http://example.com/ the trailing slash is the symbol for the root directory of example.com. If you leave the trailing slash off, the server will add it. If you request a physical directory on a Drupal site (or any site) like http://example.com/modules the server will correct you by appending the trailing slash: http://example.com/modules/. If you have non-Drupal content on your server—perhaps a WordPress blog at http://example.com/software/—you will have URLs with trailing slashes. The WordPress blog would not be located at http://example.com/software, it would be located at http://example.com/software/. You would not want to remove trailing slashes from all URLs.
That is why the Global Redirect module is a good option. It will only remove trailing slashes from URLs that are handled by Drupal.
The Image Module allows users to upload images as nodes.
It creates duplicate content on sites—at least two duplicate URLs for every image node created.

In the image above, the link to "Thumbnail" appends the query string ?size=thumbnail to the URL and redisplays the content. Once you are on the thumbnail page, a link to the preview page will be displayed. If you have allowed anonymous users to "View Original Image" in the Access Control settings, then there will be an additional link to the original image.
The URLs of duplicate content that a default installation of the Image Module are shown below:
The names of the image sizes are controlled through the Image Module settings at http://example.com/admin/settings/image:

So, for example, if you created an additional image size called tiny, the Image Module would then create an extra page of duplicate content for each image node on the site by appending ?size=tiny to the original URLs of the nodes.
To fix this issue, add the following line to your robots.txt file:
Disallow: /*size=
Drupal's Paging Module is a popular way to break up nodes across multiple pages. This module does create some problematic SEO issues though.

As shown in the image above, the Paging Module is able to break up each node into multiple pages which creates more URLs. For example, if the page above had the URL http://example.com/page-title, the following other URLs would be created for the paginated views:
That results in a single page of content with two different URLs: http://example.com/page-title?page=0,0 contains duplicate content of the node's main URL http://example.com/page-title.
The current SEO fix is to add the following line to your robots.txt file to prevent the duplicate pages from being indexed:
Disallow: /*?page=0,0$
The syntax in the above robots.txt rule is recognized by Google Search, Yahoo Search, and MSN Live Search.
Future versions of this module should be built so that the main URL is not duplicated. The link back to the main node page should not have a query string. Also, it would be best if the URLs that it generates were not dynamic.
The following example shows a possible URL structure for this module that would be better for search engine indexing:
The Path Module is a Drupal core module. Enabling it allows you to create URL aliases.
Drupal's standard URLs (once you have enabled "clean URLS" in the Admin panel) are in this format:
http://example.com/node/25
Once you have enabled the Path Module you will be able to create URL aliases for each URL. If you created a URL alias for that URL (node 25) called custom-page-title, you would then be able to access the content of node 25 at http://example.com/custom-page-title.
You would also still be able to access the content of node 25 at http://example.com/node/25. Generally, you do not have to worry about this unless your site has already been indexed with the original "node" URLs. In either case you could install the Global Redirect Module which would automatically redirect http://example.com/node/25 to your URL alias at http://example.com/custom-page-title.
A related module is the recommended Pathauto Module which automatically creates URL aliases for each node on your site.
The Spam Module filters content and comments for spam, as well as lets users flag contents for review by the administrators.
The Drupal Spam Module creates URLs on the site like:
http://example.com/spam/report/comment/1
To prevent low-quality pages from being indexed, add the following line to your robots.txt file when using the Spam Module:
Disallow: /spam/
The Tracker Module creates "track" pages for each user.
For example, a page that tracks user #234 would have a tracker page located at http://example.com/user/234/track. Those pages should be blocked from search engines with the following rule:
Disallow: /*/track$
That robots.txt syntax is recognized by Google Search, Yahoo Search, and MSN Live Search.
The tracker module also keeps track of recent posts on the site at URLs like http://example.com/tracker and on large sites creates thousands of tracker pages like http://drupal.org/tracker?page=6379.
My recommendation is to leave the first page of the Recent Posts (http://example.com/tracker) exposed to search engines, while blocking the paginated tracker pages like http://example.com/tracker?page=50. Leaving the just the first page of /tracker exposed to search engines allows search engines to rapidly find and index your latest content as it is posted.
The following rule blocks all but the first of your site-wide tracker pages:
Disallow: /tracker?
Search engines like Google and Yahoo are based on Unix (Linux or BSD). Unlike on Windows, filenames on Unix servers are case-sensitive. That means a file called INDEX.HTML is a different file than index.html.
Drupal has an SEO issue where URLs are not case sensitive. I'll explain why this is a problem.
Here's an example of case-sensitive URLs on a Unix server—Google.com:

One page with more than one URL can be seen as duplicate content in the eyes of search engines. A page that shows the same content regardless of the letter case of the URLs, is showing duplicate content.
Here's an experiment I did with Drupal and case sensitive URLs. It shows that both versions are indexed by Google as duplicate content:

I posted an issue here. I think it's a MySQL problem. Here's the code from the Drupal 5.7 Path Module:
case 'load':
$path = "node/$node->nid";
// We don't use drupal_get_path_alias() to avoid custom rewrite functions.
// We only care about exact aliases.
$result = db_query("SELECT dst FROM {url_alias} WHERE src = '%s'", $path);
if (db_num_rows($result)) {
$node->path = db_result($result);
}
break;
Here's what MySQL.com says:
The default character set and collation are latin1 and latin1_swedish_ci, so non-binary string comparisons are case insensitive by default. This means that if you search with col_name LIKE 'a%', you get all column values that start with A or a. To make this search case sensitive, make sure that one of the operands has a case sensitive or binary collation.
I think that it's something that needs to be fixed in the Path Module and/or added to the Global Redirect module.
A Drupal site isn't going to be affected by this naturally. It would only happen if someone working on the site manually links to URLs in a different case than the case of the Drupal URL aliases. I wouldn't call it a "critical" issue, but it definitely should be fixed as soon as possible. Theoretically it could be used to maliciously affect a site's rankings.
Comments and opinions welcome.
NOTE: This tutorial is no longer current. Please see the Drupal SEO Tutorial for current information on Drupal 5 and Drupal 6.
There are two problems in Drupal 4.7 that may cause problems with search engine spiders.
Tip: .htaccess is only used with Drupal on Apache server. If you are using Windows and want to install Apache, try Apache2triad which includes Apache, PHP, MySQL, Perl, Python, and much more. Apache2triad installs with a double-click. You can run Drupal on IIS, but I don't think it's a good idea.
If you don't know what URL canonicalization is, read this first.
The default .htaccess in Drupal 4.7 has some lines that you can uncomment to redirect your visitors in one of the following two ways:
This is the relevant section of the default Drupal .htaccess file — it is a bad idea to use this code on your site:
RewriteEngine on
# If your site can be accessed both with and without the prefix www.
# you can use one of the following settings to force user to use only one option:
#
# If you want the site to be accessed WITH the www. only, adapt and uncomment the following:
# RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC]
# RewriteRule .* http://www.example.com/ [L,R=301]
#
# If you want the site to be accessed only WITHOUT the www. , adapt and uncomment the following:
# RewriteCond %{HTTP_HOST} !^example\.com$ [NC]
# RewriteRule .* http://example.com/ [L,R=301]
It is a bad idea to use these default RewriteRules because they will only redirect to the Drupal home page. For example, they will redirect a request for http://example.com/MyPage to http://www.example.com/, when it should redirect to http://www.example.com/MyPage. A site should redirect to the requested page, not back to the home page.
This default Drupal .htaccess file is dangerous because external web sites might link to a page on your site like http://example.com/MyBestPage and if you use the default Drupal RewriteRules it will redirect the search engines (and visitors) to http://www.example.com/ — the "www" version of your home page; not the intended page. Don't risk confusing the search engines with 301 (permanent) redirects to your home page when you don't intend for them to go to your home page.
To fix this problem, use the following lines in your Drupal .htaccess file instead, right after the line that says RewriteEngine On, replacing example.com with your domain name:
RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
If you prefer to remove the www then use the following rule instead:
RewriteCond %{HTTP_HOST} !^example\.com$ [NC]
RewriteRule (.*) http://example.com/$1 [R=301,L]
Tip: If you want to know the details on how those rewrite rules work, check out this mod_rewrite cheat sheet.
Drupal has a problem when you are running PHP as CGI. Instead of sending "404 Not Found" errors when it can't find a page, it will send "404 Ok" errors. You can read more about it on PHP.net. When a search engine spider requests a page that doesn't exist, you want to send a proper "404 Not Found" header.
To see if you are sending faulty "404 Ok" headers, you can use Firefox with the LiveHTTPheaders extension. After you have installed that extension and restarted Firefox, go to Tools —> Live HTTP headers. That will open up the header-viewer window. Then go to your web site to a page that doesn't exist (like http://www.example.com/asdf1234). Check the LiveHTTPheader window to see if it sends a correct "404 Not Found" header or an incorrect "404 Ok" header. If it says "404 Ok", then there is a problem and you can fix it as explained below.
To fix the Drupal 404 error problem open up the file /includes/common.inc. In Drupal 4.7.3, it is about line 288 where you will find this:
drupal_set_header('HTTP/1.0 404 Not Found');
Change that line to:
drupal_set_header('Status: 404 Not Found');
Then check your headers again in Firefox with LiveHTTPheaders. If it says "404 Not Found" then your problem is solved. If it doesn't work, leave a comment below and let me know...
As mentioned below in the comments there is also a "403 OK" error that can exist on some configurations. For an example on how to fix the similar "301 OK" PHP header error, see my post on PHP redirects.
See also the Drupal bug report page for this problem.
Note: The HTTP 1.0 specifications say that "the Status-Code is intended for use by automata and the Reason-Phrase is intended for the human user. The client is not required to examine or display the Reason-Phrase." But — I did have a problem where Google would not remove some of my pages even with the manual removal tool until I fixed the headers from "404 OK" to "404 Not Found". I'm not sure what the current status on this issue is, but to be safe I recommend sending a correct "404 Not Found" header. Not all bots may be programmed according the standards.
An important aspect of Drupal SEO is the robots.txt file. Drupal 5 was the first version of Drupal that came with a robots.txt file, but it still needs some modifications.
One of the most serious SEO problems with Drupal is duplicate content. With the addition of contributed modules it can get so bad that one might refer to it as druplicate content. (ow...)
A key element of SEO on sites is getting a good, clean crawl. A robots.txt file is important for a clean crawl because it tells robots where they aren't supposed to go. There are many places on a Drupal site that search engine crawlers shouldn't go.
I've attached Drupal 5's default robots.txt file for reference and will address it in sections:
The first thing I would do is remove the Crawl-delay line. Unless you have a very large site or spidering problems, it's not needed. The other robots.txt rules that I mention here should help cut down on the number of pages crawled.
User-agent: *
Crawl-delay: 10
The next section of the default robots.txt file addresses the physical directories created by Drupal:
# Directories
Disallow: /database/
Disallow: /includes/
Disallow: /misc/
Disallow: /modules/
Disallow: /sites/
Disallow: /themes/
Disallow: /scripts/
Disallow: /updates/
Disallow: /profiles/
That section can be left as-is. Just keep in mind that it will probably keep search engines out of your logo and image files also because you are blocking your /sites/, /modules/, and /themes/ directories. If you use an alternate logo image, rename it so that it includes a keyword and place it in your /files/ directory.
The next section addresses files that are included with Drupal. I've never seen any of these files indexed, but you can leave this section in if you wish. Don't delete your CHANGELOG.txt file as some people recommend, because it lets you know what version of Drupal you are running in case you forget later.
# Files
Disallow: /xmlrpc.php
Disallow: /cron.php
Disallow: /update.php
Disallow: /install.php
Disallow: /INSTALL.txt
Disallow: /INSTALL.mysql.txt
Disallow: /INSTALL.pgsql.txt
Disallow: /CHANGELOG.txt
Disallow: /MAINTAINERS.txt
Disallow: /LICENSE.txt
Disallow: /UPGRADE.txt
This is the most important section of the default robots.txt file because it contains some errors:
# Paths (clean URLs)
Disallow: /admin/
Disallow: /aggregator/
Disallow: /comment/reply/
Disallow: /contact/
Disallow: /logout/
Disallow: /node/add/
Disallow: /search/
Disallow: /user/register/
Disallow: /user/password/
Disallow: /user/login/
Drupal doesn't have trailing slashes on the URLs, so you may want to remove trailing slashes from some of the rules as shown below:
Disallow: /admin/
Disallow: /aggregator
Disallow: /comment/reply/
Disallow: /contact
Disallow: /logout
Disallow: /node/add
Disallow: /search/
Disallow: /user/register
Disallow: /user/password
Disallow: /user/login
For example, each "Login or register to post comments" link on each node creates URLs like http://example.com/user/login?destination=comment/reply/806%2523comment_form and http://example.com/user/register?destination=comment/reply/806%2523comment_form. Drupal's default robots.txt rules will not block search engines from spidering those URLs, but if you remove the trailing slashes as I've mentioned above, it will.
The Aggregator Module creates URLs of duplicate content like http://example.com/aggregator?page=3 that are not blocked by the default robots.txt file. Removing the trailing slash on the end of "/aggregator/" in the default robots.txt file will solve that problem.
The next section of the robots.txt file addresses paths that should be blocked if you aren't using clean URLs:
# Paths (no clean URLs)
Disallow: /?q=admin/
Disallow: /?q=aggregator/
Disallow: /?q=comment/reply/
Disallow: /?q=contact/
Disallow: /?q=logout/
Disallow: /?q=node/add/
Disallow: /?q=search/
Disallow: /?q=user/password/
Disallow: /?q=user/register/
Disallow: /?q=user/login/
UPDATE: Please ignore the following lines. Further testing has shown that this rule will block all dynamic URLs in Google. So don't use it!
Most of the people reading these Drupal SEO tutorials are using clean URLs. If you are using clean URLs you can delete that section and replace it with the following line:
Disallow: /?
That line would block all of the URLs that start with ?q= as well as other miscellaneous query strings that might later appear for various reasons.
If you are not using clean URLs, modify the above section using the same logic as for the "clean paths" section above it. If your site has been indexed without clean URLS—for example, the page http://example.com/?q=node/25 has PageRank and you are going to implement clean URLs—you should use .htaccess to do 301 redirects from the dynamic versions of the URLs to the clean ones. In that case do not block the dynamic URLs from search engines because you would want them to transfer the PageRank of the dynamic URLs to the clean URLs. If that issue applies to you and my explanation doesn't make sense, please let me know in a comment below and I'll try to explain it another way.
I also recommend adding the following rules, after carefully reading and understanding the explanations given with them:
Each module potentially adds many extra URLs on the site which often create massive amounts of duplicate content and that also increase the crawling load on your server. The following rules address some extra robots.txt rules for core modules.
Disallow: /node$Disallow: /user$Disallow: /*sort=Disallow: /search$Disallow: /*/feed$Disallow: /*/track$Disallow: /tracker?/tracker exposed to search engines allows search engines to rapidly find and index your latest content as it is posted.Disallow: [front page] (replace with the path to your alternate front page)An improved version of Drupal's Robots.txt file that summarizes the explanations above can be download here.
Please see the Drupal SEO Module Database for instructions about specific rules. If you have questions about a specific module that I haven't covered yet, please contact me and I'll try to review the module as soon as possible.
This is a quick hack to Drupal 5 to add rel=nofollow to the comment authors' homepages. I'm not recommending adding nofollow to comments, only describing the technique for people who are looking for it.
Generally it isn't a good idea to modify Drupal core code, but this method works.
Open /includes/theme.inc.
Change line 1052 from:
$output = l($name, 'user/'. $object->uid, array('title' => t('View user profile.')));
to:
$output = l($name, 'user/'. $object->uid, array('title' => t('View user profile.'), 'rel' => t('nofollow')));
And change line 1064 from:
$output = l($object->name, $object->homepage);
to:
$output = l($object->name, $object->homepage, array('rel' => t('nofollow')));
Technique originally described here.
This is a series of 3 tutorial I wrote in 2007 that have suggestions for optimizing Drupal.org for better search engine rankings.
This tutorial offers some advice on how to optimize Drupal.org for search engines. This is part one of a series.
The following table shows a ranking drop in Google over the past 8 months for some of Drupal's main keywords:
| Keyword | Rank on 28 Jan 2007 | Rank on 15 Aug 2007 |
|---|---|---|
| content management system | #7 | #15 |
| cms | #7 | #36 |
Drupal is obviously one of the best, most flexible, open-source content management systems available. I think that Drupal is the best general CMS because it is very flexible, it runs on standard LAMP servers, and it is usable even by people without much of a technical background. It would be great to see Drupal.org in the top 5 on Google for both of those keywords.
This series of articles on SEO for Drupal.org will attempt to address issues that may help increase organic search engine traffic.
Drupal.org has some basic SEO issues. One of the problems is the massive amount of duplicate content that can be spidered by search engines. In addition to duplicate content issues, having those pages exposed to spiders also puts a heavier load on the server because of the number of pages that can be crawled.
Google has not been obeying robots.txt files lately, but it's a good idea to use robots.txt files correctly in preparation for when Google fixes their crawling problem. (An example of the Googlebot bug can be seen here where Google has indexed and cached sections of this site that have been blocked off with robots.txt since it was launched.)
I've written a Drupal Robots.txt Tutorial which explains some errors in the Drupal 5's default robots.txt file. I've summarized recommended changes to http://drupal.org/robots.txt in the attached PDF file.
The <title> element of a Web page gives search engines an idea about what the theme of the page is about. The home page title element is especially important. The current home page title element has the text drupal.org | Community plumbing. This is how Google displays it in the SERPs:

It isn't the most attractive listing and might not attract as many clickthroughs as a more descriptive title element. I recommend changing the home page title element of Drupal.org to:
<title>Drupal | Open Source Content Management System (CMS)</title>
or
<title>Drupal CMS | An Open Source Content Management System</title>
Even better would be to add the keyword PHP. Many people are searching for things like "how do I make a cms in php?". Displaying those keywords in the home page title would be helpful. For example:
<title>Drupal CMS | Open Source Content Management System in PHP</title>
In addition to adding primary keywords to the home page title element, that change would modify the listing of the site in the SERPs so that it looked more descriptive like this:

The important thing is to get both keywords in the home page title element:
This article is part two of the series on optimizing Drupal.org for search engines.
Rel=nofollow is a microformat that when applied to links tells search engines, "I do not vouch for the quality of this link". It tells Google that the linking page does not vouch for the quality of the linked-to page.
Nofollow is used on Drupal.org to reduce the motivation of users to post spam. When the site is viewed with the Firefox Search Status Extension, the nofollowed links show up highlighted in pink, showing the extent of the issue:

The problem with nofollow on Drupal.org is that it is getting applied to internal links. So pages that are important and that get linked to often are not getting the search engine "link juice" that they should.
One solution would be to apply a "nofollow whitelist" to the Input Filter on Drupal.org, so that pages from Drupal, its subdomains, and other official sites always have rel=nofollow removed from their links. That way nodes that are important and that get linked to often by webmasters and users would start to get more link juice and be seen as more important by search engines. That would include the often referenced pages like the excellent Drupal Handbooks.
Drupal.org has so much link popularity (PageRank 9) that it could rank #1 for just about any Drupal term that it is optimized for. One important factor is to make sure that the relevant keywords appear in the <title> elements, the <h1> elements, and in the body text. I'll use the Drupal Handbooks as an example. The Drupal Handbooks contain some of the best Drupal tutorials on the Web, yet the Handbooks do not rank in Google for those keywords.
If you search Google for drupal tutorials Drupal.org is only #7 and #8, and the pages listed are just forum threads—not the main tutorial section, the Drupal Handbooks.
Search engine visitors are more likely to search for the keywords drupal tutorials than drupal handbooks. Some of the best Drupal tutorials are found in the Drupal handbook pages, and it would be appropriate for Google to have the Drupal handbooks at #1 for the keywords drupal tutorials.
On the page http://drupal.org/handbooks, I would change to title element from:
<title>Drupal handbooks | drupal.org</title>
to
<title>Drupal handbooks and tutorials | drupal.org</title>
I would also change the H1 element of that page to <h1>Drupal Handbooks and Tutorials</h1>. Currently the H1 element is just <h1>Drupal Handbooks</h1> as shown in the image below:

This is Part Two in a series on SEO recommendations for Drupal.org. Part One can be found here. Part Three covers duplicate content issues.
This is part 3 of a case study on how Drupal.org could be further optimized for search engine rankings.
Google has indexed over 2,500 pages on the subdomain www2.drupal.org. Here is a screenshot:
Since www2.drupal.org is a duplicate of drupal.org, Google is indexing duplicate content on the site which can hurt rankings. It also puts extra load on the servers because of the extra pages being crawled.
There are two possible solutions:
The first option could be implemented with .htaccess. The second option could be implemented by having the URL http://www2.drupal.org/robots.txt serve the following content:
User-agent: *
Disallow: /
That would prevent search engines from crawling and indexing the duplicate content. (The main robots.txt file at http://drupal.org/robots.txt would serve different content—the regular robots.txt file.)