Drupal SEO and Case Sensitive URLs

Tags:

Search engines like Google and Yahoo are based on Unix (Linux or BSD). Unlike on Windows, filenames on Unix servers are case-sensitive. That means a file called INDEX.HTML is a different file than index.html.

Drupal has an SEO issue where URLs are not case sensitive. I'll explain why this is a problem.

Here's an example of case-sensitive URLs on a Unix server—Google.com:

google-com.png

One page with more than one URL can be seen as duplicate content in the eyes of search engines. A page that shows the same content regardless of the letter case of the URLs, is showing duplicate content.

Here's an experiment I did with Drupal and case sensitive URLs. It shows that both versions are indexed by Google as duplicate content:

google-case-sensitive-URLs-drupal.png

I posted an issue here. I think it's a MySQL problem. Here's the code from the Drupal 5.7 Path Module:

case 'load':
$path = "node/$node->nid";
// We don't use drupal_get_path_alias() to avoid custom rewrite functions.
// We only care about exact aliases.
$result = db_query("SELECT dst FROM {url_alias} WHERE src = '%s'", $path);
if (db_num_rows($result)) {
$node->path = db_result($result);
}
break;

Here's what MySQL.com says:

The default character set and collation are latin1 and latin1_swedish_ci, so non-binary string comparisons are case insensitive by default. This means that if you search with col_name LIKE 'a%', you get all column values that start with A or a. To make this search case sensitive, make sure that one of the operands has a case sensitive or binary collation.

I think that it's something that needs to be fixed in the Path Module and/or added to the Global Redirect module.

A Drupal site isn't going to be affected by this naturally. It would only happen if someone working on the site manually links to URLs in a different case than the case of the Drupal URL aliases. I wouldn't call it a "critical" issue, but it definitely should be fixed as soon as possible. Theoretically it could be used to maliciously affect a site's rankings.

Comments and opinions welcome.

Comments

Seems to be fixed in recent Global Redirect releases

See this issue http://drupal.org/node/349537 for D5 and D6

Webmaster Tips's picture

Global Redirect and case sensitive URLs

Great news...

Syndicate content