Drupal SEO: "404 Ok" and .htaccess

NOTE: This tutorial is no longer current. Please see the Drupal SEO Tutorial for current information on Drupal 5 and Drupal 6.

There are two problems in Drupal 4.7 that may cause problems with search engine spiders.

Drupal .htaccess: Redirecting to www

Tip: .htaccess is only used with Drupal on Apache server. If you are using Windows and want to install Apache, try Apache2triad which includes Apache, PHP, MySQL, Perl, Python, and much more. Apache2triad installs with a double-click. You can run Drupal on IIS, but I don't think it's a good idea.

If you don't know what URL canonicalization is, read this first.

The default .htaccess in Drupal 4.7 has some lines that you can uncomment to redirect your visitors in one of the following two ways:

  1. http://example.com to http://www.example.com
  2. http://www.example.com to http://example.com

This is the relevant section of the default Drupal .htaccess file — it is a bad idea to use this code on your site:

  RewriteEngine on

  # If your site can be accessed both with and without the prefix www.
  # you can use one of the following settings to force user to use only one option:
  #
  # If you want the site to be accessed WITH the www. only, adapt and uncomment the following:
  # RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC]
  # RewriteRule .* http://www.example.com/ [L,R=301]
  #
  # If you want the site to be accessed only WITHOUT the www. , adapt and uncomment the following:
  # RewriteCond %{HTTP_HOST} !^example\.com$ [NC]
  # RewriteRule .* http://example.com/ [L,R=301]

It is a bad idea to use these default RewriteRules because they will only redirect to the Drupal home page. For example, they will redirect a request for http://example.com/MyPage to http://www.example.com/, when it should redirect to http://www.example.com/MyPage. A site should redirect to the requested page, not back to the home page.

This default Drupal .htaccess file is dangerous because external web sites might link to a page on your site like http://example.com/MyBestPage and if you use the default Drupal RewriteRules it will redirect the search engines (and visitors) to http://www.example.com/ — the "www" version of your home page; not the intended page. Don't risk confusing the search engines with 301 (permanent) redirects to your home page when you don't intend for them to go to your home page.

To fix this problem, use the following lines in your Drupal .htaccess file instead, right after the line that says RewriteEngine On, replacing example.com with your domain name:

  RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC]
  RewriteRule (.*) http://www.example.com/$1 [R=301,L]

If you prefer to remove the www then use the following rule instead:

  RewriteCond %{HTTP_HOST} !^example\.com$ [NC]
  RewriteRule (.*) http://example.com/$1 [R=301,L]

Tip: If you want to know the details on how those rewrite rules work, check out this mod_rewrite cheat sheet.

Drupal's 404 Ok Error

Drupal has a problem when you are running PHP as CGI. Instead of sending "404 Not Found" errors when it can't find a page, it will send "404 Ok" errors. You can read more about it on PHP.net. When a search engine spider requests a page that doesn't exist, you want to send a proper "404 Not Found" header.

To see if you are sending faulty "404 Ok" headers, you can use Firefox with the LiveHTTPheaders extension. After you have installed that extension and restarted Firefox, go to Tools —> Live HTTP headers. That will open up the header-viewer window. Then go to your web site to a page that doesn't exist (like http://www.example.com/asdf1234). Check the LiveHTTPheader window to see if it sends a correct "404 Not Found" header or an incorrect "404 Ok" header. If it says "404 Ok", then there is a problem and you can fix it as explained below.

To fix the Drupal 404 error problem open up the file /includes/common.inc. In Drupal 4.7.3, it is about line 288 where you will find this:

  drupal_set_header('HTTP/1.0 404 Not Found');

Change that line to:

  drupal_set_header('Status: 404 Not Found');

Then check your headers again in Firefox with LiveHTTPheaders. If it says "404 Not Found" then your problem is solved. If it doesn't work, leave a comment below and let me know...

Update: PHP "301 OK" Header Errors

As mentioned below in the comments there is also a "403 OK" error that can exist on some configurations. For an example on how to fix the similar "301 OK" PHP header error, see my post on PHP redirects.

Drupal.org Bug Report

See also the Drupal bug report page for this problem.

Note: The HTTP 1.0 specifications say that "the Status-Code is intended for use by automata and the Reason-Phrase is intended for the human user. The client is not required to examine or display the Reason-Phrase." But — I did have a problem where Google would not remove some of my pages even with the manual removal tool until I fixed the headers from "404 OK" to "404 Not Found". I'm not sure what the current status on this issue is, but to be safe I recommend sending a correct "404 Not Found" header. Not all bots may be programmed according the standards.


Comments

Nice!

Thanks for the post. I've implemented both of these.

Have you had any luck getting these considered as permanent changes in the drupal codebase?

Thanks again,
Dan

Webmaster Tips's picture

Drupal SEO

Glad that you found it useful. I've let them know, but haven't heard anything back about it.

See also the related article on basic Drupal on-site optimization.

:)

works! really needed this and was on my to-scratch list, thank you!

A question though:
noticed that few lines below "drupal_set_header('HTTP/1.0 404 Not Found');" there is also a line dealing with 403 Forbidden.
I tryed changing also this from:

'HTTP/1.0 403 Forbidden'

to

'Status: 403 Forbidden'

but didn't get the same result, do you know by chance how to deal with 403s as well?

cheers

Webmaster Tips's picture

Drupal 403 Ok error

I didn't realize that there was also a PHP 403 Ok error, but just checked and it is there. I'm not sure how to fix it. Does anyone speak Russian? This page looks like it might have an answer, but I wasn't able to translate the page with Babelfish.

There's a link where is

There's a link where is possible to translate some lines at a time: http://translation2.paralink.com

"As is known, to transfer HTTP status code in PHP it is possible in two ways:
header (" Status: 403 Forbidden ")
header (" HTTP/1.1 403 Forbidden ")
Problem in that these ways, apparently, can work or not work depending on different factors, first of all from, whether it is established PHP as the module of the apache or as CGI.
Precisely I have established the following:
PHP5 (any version, it is checked up repeatedly), установленый as the CGI-appendix, normally works with Status:....
PHP 4.4.0, established..."

Unfortunately i'm no coder, no mother-language, so not really easy task.. ;P
I tryed the same trick explained by you above for the 403 error, but didn't seem to work. Sorry can't help further, but well, guess we can also live well enough with that anyway ;)

cheers

One more thing off my list!

Nice. I was just checking out my new drupal 4.7 install in LiveHTTPHeders and thought "404 OK - I don't want that." Found your page 2 minutes later and had it fixed in about two more minutes. Thanks for writing this up!

Aside from the mentioned

Aside from the mentioned Apache2Triad, I recommend the Web-Developer Server Suite. It comes with Drupal v5.

Redirect broke my subdomain

I did the no-www redirect and it broke my subdomain.

Webmaster Tips's picture

no-www redirect

Try this and see if it works:

RewriteCond %{HTTP_HOST} ^www\.example\.com$ [NC]
RewriteRule (.*) http://example.com/$1 [R=301,L]

The rewrite rule mentioned in the post says "if not http://example.com, then make it http://example.com." The one that I posted in this comment says, "if http://www.example.com, then redirect to http://example.com.

Missed a backslash

You need a backslash in your regular expression to escape the dot between www and example.

RewriteCond %{HTTP_HOST} ^www\.example\.com$ [NC]
- don't forget to escape ----^---- your dot

RewriteRule (.*) http://example.com/$1 [R=301,L]

Webmaster Tips's picture

Typo

Thanks for catching that... It has been fixed...

Good post and about 403 forbidden

The 404OK is an important issue, thanks.

Somehow, the fix for the 403 Forbidden worked fine for me:
http://www.filination.com/tech/2007/04/04/drupal-seo-verifying-404-not-foundforbidden-headers-are-not-ok-and-custom-404-pages/

Thanks

Thanks for the post, it's very useful. I've implemented it on my client site - http://internetcashadvance.com and it works great, thanks!

It took me by surprise that those things aren't in the basic Drupal installation, but eventually version 5.* has 404 header error fixed, so it looks like second part of the post is out of date for folks, keeping their Drupal up to date.

Thanks again,
internet cash advance webmaster

Thanks

Thanks for your tips

Syndicate content