Drupal SEO: "404 Ok" and .htaccess

NOTE: This tutorial is no longer current. Please see the Drupal SEO Tutorial for current information on Drupal 5 and Drupal 6.

There are two problems in Drupal 4.7 that may cause problems with search engine spiders.

Drupal .htaccess: Redirecting to www

Tip: .htaccess is only used with Drupal on Apache server. If you are using Windows and want to install Apache, try Apache2triad which includes Apache, PHP, MySQL, Perl, Python, and much more. Apache2triad installs with a double-click. You can run Drupal on IIS, but I don't think it's a good idea.

If you don't know what URL canonicalization is, read this first.

The default .htaccess in Drupal 4.7 has some lines that you can uncomment to redirect your visitors in one of the following two ways:

  1. http://example.com to http://www.example.com
  2. http://www.example.com to http://example.com

This is the relevant section of the default Drupal .htaccess file — it is a bad idea to use this code on your site:

  RewriteEngine on

  # If your site can be accessed both with and without the prefix www.
  # you can use one of the following settings to force user to use only one option:
  #
  # If you want the site to be accessed WITH the www. only, adapt and uncomment the following:
  # RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC]
  # RewriteRule .* http://www.example.com/ [L,R=301]
  #
  # If you want the site to be accessed only WITHOUT the www. , adapt and uncomment the following:
  # RewriteCond %{HTTP_HOST} !^example\.com$ [NC]
  # RewriteRule .* http://example.com/ [L,R=301]

It is a bad idea to use these default RewriteRules because they will only redirect to the Drupal home page. For example, they will redirect a request for http://example.com/MyPage to http://www.example.com/, when it should redirect to http://www.example.com/MyPage. A site should redirect to the requested page, not back to the home page.

This default Drupal .htaccess file is dangerous because external web sites might link to a page on your site like http://example.com/MyBestPage and if you use the default Drupal RewriteRules it will redirect the search engines (and visitors) to http://www.example.com/ — the "www" version of your home page; not the intended page. Don't risk confusing the search engines with 301 (permanent) redirects to your home page when you don't intend for them to go to your home page.

To fix this problem, use the following lines in your Drupal .htaccess file instead, right after the line that says RewriteEngine On, replacing example.com with your domain name:

  RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC]
  RewriteRule (.*) http://www.example.com/$1 [R=301,L]

If you prefer to remove the www then use the following rule instead:

  RewriteCond %{HTTP_HOST} !^example\.com$ [NC]
  RewriteRule (.*) http://example.com/$1 [R=301,L]

Tip: If you want to know the details on how those rewrite rules work, check out this mod_rewrite cheat sheet.

Drupal's 404 Ok Error

Drupal has a problem when you are running PHP as CGI. Instead of sending "404 Not Found" errors when it can't find a page, it will send "404 Ok" errors. You can read more about it on PHP.net. When a search engine spider requests a page that doesn't exist, you want to send a proper "404 Not Found" header.

To see if you are sending faulty "404 Ok" headers, you can use Firefox with the LiveHTTPheaders extension. After you have installed that extension and restarted Firefox, go to Tools —> Live HTTP headers. That will open up the header-viewer window. Then go to your web site to a page that doesn't exist (like http://www.example.com/asdf1234). Check the LiveHTTPheader window to see if it sends a correct "404 Not Found" header or an incorrect "404 Ok" header. If it says "404 Ok", then there is a problem and you can fix it as explained below.

To fix the Drupal 404 error problem open up the file /includes/common.inc. In Drupal 4.7.3, it is about line 288 where you will find this:

  drupal_set_header('HTTP/1.0 404 Not Found');

Change that line to:

  drupal_set_header('Status: 404 Not Found');

Then check your headers again in Firefox with LiveHTTPheaders. If it says "404 Not Found" then your problem is solved. If it doesn't work, leave a comment below and let me know...

Update: PHP "301 OK" Header Errors

As mentioned below in the comments there is also a "403 OK" error that can exist on some configurations. For an example on how to fix the similar "301 OK" PHP header error, see my post on PHP redirects.

Drupal.org Bug Report

See also the Drupal bug report page for this problem.

Note: The HTTP 1.0 specifications say that "the Status-Code is intended for use by automata and the Reason-Phrase is intended for the human user. The client is not required to examine or display the Reason-Phrase." But — I did have a problem where Google would not remove some of my pages even with the manual removal tool until I fixed the headers from "404 OK" to "404 Not Found". I'm not sure what the current status on this issue is, but to be safe I recommend sending a correct "404 Not Found" header. Not all bots may be programmed according the standards.