Drupal Tutorials

This section of the site contains Drupal tutorials. When browsing the Drupal-specific areas of this site, look for the Drupal tutorials menu that appears on the left sidebar. The list of tutorials is also listed below.

Good places to start are the Drupal theme tutorial, the Pathauto tutorial, and the Drupal SEO tutorial.

About Drupal

Drupal is a free open-source content management system (CMS) that is powered by PHP and MySQL. After much experimentation I believe that Drupal is the best open-source content management system available. WordPress is the only other open-source content management system that I use.

Why I prefer Drupal over WordPress

WordPress was designed for blogging. It can also be used as a lightweight content management system. For heavy-duty sites I recommend using Drupal instead of WordPress. Drupal has many advantages over WordPress like:

Drupal's Learning Curve

Drupal does have a bit of a learning curve. I am focusing these tutorials on two aspects of Drupal: issues that beginners have trouble with, and Drupal SEO. Some of the tutorials here was originally written for Drupal 4.7, but they are generally updated for Drupal 5 and 6 as my time allows. If you have questions for me about what I've written, please post them in the comments or in our Webmaster Forum.

Drupal Book and Documentation

I highly recommend the excellent book Pro Drupal Development. Also the official documentation has a lot of good information and Drupal recipes.

More Drupal Tutorials

If you have a request for a tutorial on a specific aspect of Drupal, please post your request in our new Webmaster Forum, and I'll do my best to write a tutorial to cover your questions when I have time.

Drupal Hosting

I've tried running Drupal sites on many cheap hosting companies, including ones in the following list. I highly recommend using Site5 hosting for inexpensive Drupal hosting. This Drupal site runs on Site5 hosting and regularly handles several thousand visitors per day without problems.

Other cheap hosting companies that I've installed Drupal on:

How to Install Drupal - It's Easy

UPDATE: This tutorial was for Drupal 4.7. With Drupal 5 and Drupal 6 installation is much easier. Just upload the files to your server and visit your site. Follow the directions on the screen and you are done!

I've read some reviews of Drupal online that claim Drupal is hard to install.

I'm not sure what kinds of problems people have had, but installing Drupal is actually very easy if you server is configured correctly.

There are only four steps:

  1. Upload Drupal to your server (for Drupal hosting, I recommend Site5)
  2. Create a database and database user
  3. Edit the file /sites/default/settings.php to add your database connection information
  4. Create the database tables using the appropriate .sql file in the /database directory

An easy way to install Drupal

Downloading and Uploading Drupal

First download Drupal from Drupal.org.

Uncompress the file. If you are using Windows and are not sure how to uncompress a file, download a free trial version of WinRAR. WinRAR is similar to WinZIP, but I think it's better.

Upload the files in the Drupal directory to your web server with an FTP program. If you are using GNU/Linux/BSD, Konqueror and gFTP are two good FTP programs. If you are using Windows, try a free FTP program called Filezilla (or Portable Filezilla).

Creating a MySQL database and database user

You should create a database in MySQL for your Drupal installation. You can usually do that through your web hosting control panel. You need to create a database, a database user, and then add the database user to the database with "all" privileges. If you don't know how to create a MySQL database, check your hosting company's help documentation.

Edit settings.php

You then need to add your database connection settings to the file located at /sites/default/settings.php.

Find the line that says:

$db_url = 'mysql://username:password@localhost/databasename';

Fill in your database username, database user's password, the database name, and the database host (usually just localhost)

Create the database tables

Navigate to Drupal's /database directory on the Drupal folder your local computer.

Here is a list of the files in that directory from the current Drupal version 4.7:

Drupal database files

You are probably using MySQL. You hosting control panel should let you know which version of MySQL is installed. If you can't find out what version on MySQL is installed, you can create a file called info.php that contains the following lines:

<?php 
  echo phpinfo();
?>

Upload that info.php file to the root of your Web server and then open it in a browser (e.g., http://www.EXAMPLE.com/info.php). That page will display information about your PHP and MySQL installation including the MySQL version.

If you are using MySQL 4.0 then you will use the file called database.4.0.mysql. If you are using MySQL 4.1 then you should use the file called database.4.1.mysql. (If using PostgreSQL, then use the file database.pgsql.)

Go to your web hosting control panel and find PHPmyAdmin (most hosting companies offer it). Click on the tab that says "Import" as shown below:

PHPmyAdmin - importing database commands

Use the "Browse" button to find the correct database file to upload. PHPmyAdmin will create your database from that file.

Finished

After you complete those steps Drupal should be installed. Navigate to your new Drupal site (e.g., http://www.EXAMPLE.com/) and you should see the welcome page something like the one shown below, although yours may have a different blue theme:

Drupal welcome page

Follow the directions on that page to create the admin account, and configure your new Drupal site. More information can be found in the Drupal handbooks.

Further tutorials will cover how to set up and customize your new Drupal site.

Shortcut for Installing Drupal & Modules Over SSH

If your server has SSH access you can upload Drupal and all of your modules and themes in their compressed state and then uncompress them on the server. The methods below will greatly speed up the process of installing Drupal, especially if you have a slow Internet connection.

If you don't already have hosting with SSH access, please see the Drupal Hosting page for recommendations.

First upload the compressed Drupal file. Then in a terminal (over SSH) type the following command to uncompress the Drupal file, changing the filename to the current Drupal version:

tar -zxvf drupal-5.5.tar.gz

Then navigate into the new directory, in this case, Drupal 5.5:

cd drupal-5.5

Then move all the new files up one level:

mv * ../

Don't forget the hidden .htaccess file:

mv .htaccess ../

Then move up one directory:

cd ..

Delete the empty folder:

rmdir drupal-5.5

Remove the compressed Drupal file:

rm drupal-5.5.tar.gz

Navigate to your sites/all directory:

cd ./sites/all

and make folders for your modules and themes:

mkdir modules themes

Move into your module directory:

mv modules

Upload your modules in their compressed state to the modules directory. Then run the following script to uncompress all of your modules and delete the compressed files -- hit ENTER after each line:

for i in *tar.gz
do
tar -zxvf $i
rm $i
done

How to Make a Drupal Theme

Making a custom Drupal theme is actually quite easy. A Drupal theme is just a few PHP files and a CSS file. I prefer the PHPtemplate theme engine (the default one) but you have several choices. See the bottom of this post for a link to the official Drupal Theme Developer's Guide which has information on other Drupal template engines.

The following information was originally written for Drupal 4.7, but the concepts also work for Drupal 5 and 6 too.

Navigate to your /themes directory. You should have a theme there called /bluemarine. We will use that as an example.

NOTE: before you edit any files you will copy the theme to another directory and rename it. Your custom themes go in the directory /sites/all/themes/. Details about that come later in this tutorial.

Here is a list of the files in the Bluemarine Drupal template:

Bluemarine Drupal tempate directory

The Files of a Drupal Template

page.tpl.php and style.css

The page.tpl.php and style.css files are the main files for your Drupal theme. The page.tpl.php is a mix of HTML and PHP. Look at the file and notice which snippets of PHP are used where. For example, the following snippet from the page.tpl.php file inserts the site's <head> information. Just copy that snippet into your own custom Drupal template.

<head>
  <title><?php print $head_title ?></title>
  <?php print $head ?>
  <?php print $styles ?>
  <?php print $scripts ?>
  <script type="text/javascript"><?php /* Needed to avoid Flash of Unstyle Content in IE */ ?> </script>
</head>

The following code from the Bluemarine page.tpl.php file use PHP if statements to print out optional information such as primary links, secondary links, and site slogan. You control whether those display in the Drupal control panel. The Bluemarine template uses tables, but you can easily remove the tables and make it a 100% CSS-based template.

<table border="0" cellpadding="0" cellspacing="0" id="header">
  <tr>
    <td id="logo">
      <?php if ($logo) { ?><a href="<?php print $base_path ?>" title="<?php print t('Home') ?>">
<img src="<?php print $logo ?>" alt="<?php print t('Home') ?>" /></a><?php } ?>
      <?php if ($site_name) { ?>
<h1 class='site-name'><a href="<?php print $base_path ?>" 
title="<?php print t('Home') ?>"><?php print $site_name ?></a></h1><?php } ?>
      <?php if ($site_slogan) { ?><div class='site-slogan'><?php print $site_slogan ?></div><?php } ?>
    </td>
    <td id="menu">
      <?php if (isset($secondary_links)) { ?>
<div id="secondary"><?php print theme('links', $secondary_links) ?></div><?php } ?>
      <?php if (isset($primary_links)) { ?>
<div id="primary"><?php print theme('links', $primary_links) ?></div><?php } ?>
      <?php print $search_box ?>
    </td>
  </tr>
  <tr>
    <td colspan="2"><div><?php print $header ?></div></td>
  </tr>
</table>

The Drupal styles.css File

The style.css file is straightforward. I recommend the Firefox Web Developer Toolbar for creating the style.css file. Use the toolbars option Display ID & Class Details in the Information menu to view the CSS classes and ID's that Drupal is generating. Then add your own CSS rules to the style.css file.

Other Drupal Theme Files

Other files in the Drupal theme are block.tpl.php, box.tpl.php, comment.tpl.php, and node.tpl.php. Each one controls the layout of certain parts of the template. The comment.tpl.php defines the comment layout as shown below. It is fairly straightforward PHP: "If there is a user picture, print the user picture, etc.

  <div class="comment<?php if ($comment->status == COMMENT_NOT_PUBLISHED) print ' comment-unpublished'; ?>">
    <?php if ($picture) {
    print $picture;
  } ?>
<h3 class="title"><?php print $title; ?></h3>
<?php if ($new != '') { ?><span class="new"><?php print $new; ?></span><?php } ?>
    <div class="submitted"><?php print $submitted; ?></div>
    <div class="content"><?php print $content; ?></div>
    <div class="links">&raquo; <?php print $links; ?></div>
  </div>

Your First Custom Drupal Theme

Just make a copy of the default Bluemarine template and put it in your Drupal /sites/all/themes/ directory. That directory doesn't exist by default, so you should create it if you haven't already. See the README file in /sites/all/ for more information. Rename the copy of Bluemarine to the name of your new theme. Enable the new theme.

Then strip most of the HTML out of the page.tpl.php file and replace it with the HTML that you would like for your theme. Leave the PHP, modifying it as desired. If you are using Linux for Web development, you can use Quanta Plus as an editor to edit your template files directly on the server. Each time you save the file in Quanta Plus, the remote copy of the file will be updated.

Use the Firefox Web Developer Toolbar's Display ID & Class Details feature to view CSS information on your new template that you are viewing the the browser. Either start a new style.css file from scratch, or modify the existing one to get the template the way you would like. To edit the display of blocks, nodes, and comments, edit the block.tpl.php, node.tpl.php, and comment.tpl.php files respectively.

When you are finished with your template, take a screenshot and resize it to about 150x90 pixels. Upload it to your theme directory as screenshot.png.

Drupal Template Variables

The PHP snippets in the examples above are just printing PHPtemplate variables. A complete list of available PHPtemplate variables that you can use in your template can be found on Drupal.org's PHPtemplate variables page. Below are the available variables from 24 July 2007:

$breadcrumb
HTML for displaying the breadcrumbs at the top of the page.
$closure
Needs to be displayed at the bottom of the page, for any dynamic javascript that needs to be called once the page has already been displayed.
$content
The HTML content generated by Drupal to be displayed.
$directory
The directory the theme is located in , e.g.
themes/box_grey or themes/box_grey/box_cleanslate.
$footer_message
The footer message as defined in the admin settings.
$head
HTML as generated by drupal_get_html_head().
$head_title
The text to be displayed in the page title.
$help
Dynamic help text, mostly for admin pages.
$is_front
True if the front page is currently being displayed. Used to toggle the mission.
$language
The language the site is being displayed in.
$layout
This setting allows you to style different types of layout ('none', 'left', 'right' or 'both') differently, depending on how many sidebars are enabled.
$logo
The path to the logo image, as defined in theme configuration.
$messages
HTML for status and error messages, to be displayed at the top of the page.
$mission
The text of the site mission.
$node
(5.x and after only)If you are in page.tpl.php displaying a node in full page view then $node is available to your template.
$onload_attribute
(4.7 and older only) Onload tags to be added to the head tag, to allow for autoexecution of attached scripts.
$primary_links (array)
An array containing the links as they have been defined in the phptemplate specific configuration block.
$scripts
(5.x and after only) HTML to load the JavaScript files and make the JS settings available. Previously, javascript files are hardcoded into the page.tpl.php
$search_box
True(1) if the search box has been enabled.
$search_button_text
(4.7 and older only)Translated text on the search button.
$search_description
(4.7 and older only)Translated description for the search button.
$search_url
(4.7 and older only)URL the search form is submitted to.
$secondary_links (array)
An array containing the links as they have been defined in the phptemplate specific configuration block.
$sidebar_left
The HTML for the left sidebar.
$sidebar_right
The HTML for the right sidebar.
$site
The name of the site, always filled in.
$site_name
The site name of the site, to be used in the header, empty when display has been disabled.
$site_slogan
The slogan of the site, empty when display has been disabled.
$styles
Required for stylesheet switching to work. This prints out the style tags required.
$tabs
HTML for displaying tabs at the top of the page.
$title
Title, different from head_title, as this is just the node title most of the time.

There are also other variables available for your Drupal theme. A good list can be found in Chapter 8 of the essential book Pro Drupal Development. I believe that Chapter 8 is a free sample download.

Also check out these two books from Packt Publishing: Drupal 5 Theming and Building powerful and robust websites with Drupal 6.

More Drupal Theme Documentation

For more information on how to make a Drupal theme, check out the official Drupal Theme Developer's Guide, the PHPTemplate theme engine documentation, and the Themeable Functions list.

How to Make a Custom Front Page in Drupal

There are several ways to set a custom front page in Drupal. Four of them are described here.

Custom Front Page with a Node

In Drupal 5 you can set a custom front page on your Site Information page found at http://example.com/admin/settings/site-information. At the bottom of that page you will see the following option:

Setting a custom front page on Drupal's Site Information Page.

To make a custom front page, just create a new node and then enter the path to that new node in the "Default front page" settings.

For example, if you build a front page on a node at http://example.com/new-front-page, then just enter new-front-page into the "Default front page" settings.

The main SEO issue to be aware of when doing this is that it creates a duplicate front page URL: http://example.com/new-front-page will show the same content as http://example.com/. Possible fixes for that problem are either to:

  1. Block off the duplicate URL with robots.txt, e.g., Disallow: /new-front-page$
  2. or install the Global Redirect Module which will automatically 301 redirect your front page node to the actual front page (http://example.com/).

Custom Front Page with the Front Module

For more control over your front page, for example to completely override your theme, you could install the Front Page Module. The Front Page Module allows you to set different front pages by user role, as well as include PHP snippets in the front page.

Use of the module is straightforward and is basically self-documenting. A screenshot of the Advanced Front Page Settings from the Front Module is shown below:

Drupal Front Page Module settings

If you use Drupal's Front Page Module, it will create a duplicate front page located at http://example.com/front_page. To address this issue from an SEO standpoint, add the following line to your robots.txt file:

Disallow: /front_page

Custom Front Page with the Views Module

If you have the Views Module installed, it will create a front page view located at http://example.com/frontpage. You can control the settings for that page at http://example.com/admin/build/views.

Drupal Views, custom front page

To edit the View for this type of custom front page, go to http://example.com/frontpage and click on the "Override" tab. Then create a Drupal View just as you would create any other Drupal View.

If you have the Views Module installed, be sure to block off that duplicate front page with the following robots.txt rule:

Disallow: /frontpage$

Custom Front Page with Drupal Theming

To create a custom front page in Drupal through your theme, just create a file in your theme directory called page-front.tpl.php and add the code for your front page there. Whatever you put in that file will be the front page of your Drupal site.

CSS Poster and Drupal Template Customization

CSS Poster is a free online tool from chami.com. Just upload a CSS file, and CSS Poster will make a chart based on your CSS file.

Here is a cropped example using Drupal's Bluemarine template's CSS file:

CSS Poster used to display Drupal's Bluemarine Template CSS

A potential use for CSS Poster that immediately comes to mind is using it as a reference when customizing Drupal themes, or any kind of CSS file that was created by someone else.

Drupal Theming Tutorials

I've found some great Drupal templating tutorials online.

NickLewis.org has a series about Extreme Drupal Theming with PHPtemplate. Here is one tutorial from that series on how to customize the login form.

Bryght.com has a great introduction to converting a CSS/HTML design to a Drupal theme. Bryght.com also has an interesting tutorial on how to create a contact form in Drupal with the survey module. The survey module can be found here. I haven't tried that method of creating contact forms, but it looks interesting.

The drupal.org phptemplate docs are also very good. The PHPTemplate Theme Snippets section has useful Drupal template recipes.

Drupal Editors

After trying many different text editors and IDEs I've settled on Vim. If you don't already use Vim, I highly recommend checking it out. It has a bit of a learning curve, but it's worth it because it makes text editing much easier.

Also, because you will probably doing some Drupal work over SSH, Vim is ideal because it is most likely available when accessing your server over SSH.

If you are new to Vim, you might want to start with Cream—an easy to use version of Vim. Also work through some Vim tutorials. Print out some Vim cheatsheets and paste them next to your desk, or keep them available on your desktop. After getting used to the commands, switch from Cream to Gvim (shown below, running on Ubuntu GNU/Linux).

Gvim and Drupal

If you already use Vim, check out the configuring Vim for Drupal page on Drupal.org. It has some great tips.

I normally set my Vim to indent with four spaces. However, Drupal coding standards require only two spaces. One neat trick mentioned on the Vim for Drupal page is that you can place the following modeline at the bottom of any of your source code files and Vim will automatically apply those settings to the file every time you open it:

// vim: set filetype=php expandtab tabstop=2 shiftwidth=2 autoindent number smartindent:

(My version is slightly different than the one on Drupal.org because I've added number which tells Vim to add line numbers to the file when I open it.)

For more information on modelines, type :help modeline into Vim.

If you prefer another editor, Drupal.org also has information on configuring Emacs and Eclipse for Drupal.

If you are looking for a simpler kind of text editor just as a replacement for Windows Notepad, maybe try HTML Kit (Windows-only, though possible to run on Linux with WINE), PSPad (Windows), SciTE (Windows or Linux), or Notepad++ (Windows-only). If you are using Linux, try Quanta Plus. I don't have a Mac so I'm not sure what editors are available for it other than Emacs and Vim.

Drupal Resources

If you haven't already found these resources, check them out:

Drupal Security - the CHANGELOG.txt file

I've been having a lot of problems lately with WordPress sites getting hacked. So far no one has hacked any of my Drupal sites.

Every Drupal installation comes with a CHANGELOG.txt file in the root directory. It gives hackers a potential way to see what version of Drupal you're running. I recommend changing the name of CHANGELOG.txt to something that can be guessed. Don't delete the file completely because it lets you know what version of Drupal you're running.

I think that CHANGELOG.txt should automatically be renamed by Drupal during installation, but it has already been discussed and the discussion has been closed. So for now the file has to be renamed manually.

Drupal Security Team Updates

It's also wise to make sure you're running the latest release of Drupal. You can sign up for Drupal security announcements. Drupal 5.10 was released today and I got an email from them right away.

If you find any security holes in Drupal, report them to the Drupal security team.

Drupal Taxonomy Tutorial

One of Drupal's best features is taxonomy. This tutorial gives an introduction on how to use Drupal's taxonomy system.

New users to Drupal sometimes find taxonomy difficult to understand at first. I recommend reading all the way through this tutorial once. If the first part doesn't make sense, I hope that it will by the end.

Taxonomy can be thought of as categories or tags. Drupal taxonomy is made up of vocabularies and terms. A vocabulary is a set of terms and terms are just another word for categories.

The image below is from the taxonomy (categories) page located at http://example.com/admin/content/taxonomy:

Drupal taxonomy settings

Let's build the taxonomy for a news site as an example. First click on the Add vocabulary tab. You will then see a screen that looks something like the following image:

Drupal taxonomy vocabulary 1

For Vocabulary name type in News. For Description enter Categories that fall in the news section of the site. For Help text, enter Please choose a section of the Web site for this news story.

Continuing down the page, you will see something like the following image. It will probably have fewer options than what you see below. The example below is from a site that has many different content types. If you are using a new Drupal installation, you might just see two content type options: Page and Story. For this tutorial, just check the box for Story and leave the rest unchecked. That means that only Story nodes can use categories from the News vocabulary (remember that a vocabulary is a collection of terms or categories).

Drupal taxonomy vocabulary 2

The next part of the page has additional options:

Drupal taxonomy vocabulary 3

Check the option Single hierarchy. That means that categories will be able to have a parent-child relationship, with only one parent for each child term. I'll explain that concept below.

Check the option Required. That means that every story node (news page) will have to be placed in one category of the site.

Leave the other options unchecked for now. Here is a quick explanation in case you would like to use them on your site:

Hierarchy: Disabled
This means that categories do not have a parent-child relationship (sub-categories). All categories are on the same level.
Hierarchy: Single
Single Hierarchy means that your categories can have parent-child relationships. Each child can only have one parent category. For example, you might have a category called "Widgets", and then several subcategories called "Green Widgets", "Blue Widgets", and "Red Widgets". The subcategories could only be children of the "Widgets" category, and not subcategories of any other category.
Hierarchy: Multiple
Multiple Hierarchy means that your categories can have parent-child relationships and that each child can have multiple parent categories. For example, if you have two categories called "Cheap Widgets" and "Expensive Widgets", your subcategories, "Green Widgets", "Blue Widgets", and "Red Widgets", might be subcategories of both "Cheap Widgets" and "Expensive Widgets".
Related Terms
You can use this to choose related categories for each term, but I don't know if it has any use in Drupal's default setup unless you are writing custom PHP code.
Free Tagging
Free tagging allows you to enter categories on the node creation pages instead of choosing them from a list. Drupal uses AJAX to suggest possible categories that already exist in the database. This option is useful when you want to give your users freedom to define their own categories—for example adding tags to blog posts.
Multiple Select
This option allows you to select more than one category for a node.
Required
This option makes it so that you can't create a node without choosing a category for it.
Weight
The Weight controls the order that vocabularies (sets of categories) are displayed in.

After you have submitted the form as explained above, you will see something like the following screen:

New vocabulary added

The vocabulary has been created, but there are no terms categories in it yet. Click on the Add terms link.

Add a new term

For Term name enter Local. For Description enter Local News. Then click the submit button. You will be directed back to the same form where you can add another category. For Term name enter World. For Description enter World News.

After you have created the first two terms, create the third, but this time choose a parent category Local from the drop-down select box as shown below:

Choose a parent term

Name the term Sports. That will be the category for the local sports section.

Continue to add terms, in a hierarchical structure as shown in the outline below:

It will look something like the following image:

Taxonomy hierarchy in Drupal

Pathauto Module and Taxonomy

For optimal results, I recommend also setting up the Pathauto Module. Using the Pathauto Module with Taxonomy will allow you to automatically generate custom URLs that put each news item in a different section of the Web site.

After you have the Pathauto Module installed, enter the following setting for Story Paths:

Drupal's Pathauto and Taxonomy combined

The settings [vocab]/[catpath]/00[nid]-[title] will create news URLs that look like http://example.com/news/local/weather/0025-page-title and were chosen for the following reasons:

[vocab]
This will be replaced with the name of your vocabulary— in this case news.
[catpath]
This will be replaced with the category and all of the parent categories. For example, a post in Local—>Weather will have a catpath of local/weather.
00[nid]-[title]
This might be a little bit controversial, but there is a reason for doing it. If you would like to someday be included in Google News you need a unique three-digit number in your news URLs. [nid] is the node ID which is unique for each node. Padding it with two leading zeros guarantees that the number will always be at least three digits. The [title] is the title of your post and it optional for people who want to include keywords in their URL.

Now you have setup Drupal to have a news section that will automatically organize your content by category and generate nice URLs for you.

If you also want blogs on your site you could create a second vocabulary called Tags and choose the freetagging option. Also choose Disable Hierarchy. Make the vocabulary required by checking the relevant checkbox on the Add Vocabulary page. Associate the Tags vocabulary only with the blog post content type. Choose a URL structure in Pathauto—perhaps something like blogs/[user]/[cat]/[title] or blogs/[cat]/[title].

If you have questions, please leave a comment below. See also the Pathauto Tutorial for more information about how to make Taxonomy and Pathauto work together.

Further Reading

For more about Drupal Taxonomy, see the Drupal Taxonomy Handbook.

If you are interested in actually building a news site in Drupal, check out the writeup of the New York Observer project.

Drupal's Twitter Module Test

I've added this site to Twitter at http://twitter.com/webmastertips.

Drupal has a Twitter Module, but it doesn't come with any documentation. This post is a first test of Drupal's Twitter Module on this site. If it works, I'll follow up on this post.


UPDATE: The Twitter Module for Drupal works. Here is some documentation for it:

  1. Download the Twitter Module from Drupal.org.
  2. Upload it to your /sites/all/modules/ directory.
  3. Enable the module in your Drupal admin section.
  4. Click on My account in your navigation menu and then on the edit tab.
  5. Enter your Twitter username and password, and the text format like this:
    Adding Drupal posts to Twitter

Once you have followed those five steps, every time you post to your Drupal site from that user account it will add the post to your Twitter account.

An alternate method for auto-posting to Twitter that I prefer is Twitterfeed. The difference between the Drupal Twitter module and Twitterfeed is that the Drupal module allows all your users to post their own content to their own Twitter profile. Twitterfeed is a 3rd party service that has to be manually setup for each feed.

By the way, check out this comparison of icons from Drupal (top) and Twitter (bottom).

twitter drop or cloud

drupalicon

How to Backup a Drupal Site

Backing up a Drupal site has always been a tedious process. I've just learned about an easier way.

The Old Way

In the past I've always used this method to backup a Drupal site:

  1. Backup the website with SSH
  2. Figure out how to login to client's webhosting control panel and either mysqldump over SSH or use PHPmyAdmin.

The Easier Way

(I learned about this method from Ferenc's travel photographer blog. His Photography Directory is built in Drupal.)

Download and install Drupal's Backup and Migrate Module. Then go to admin > content > Backup & Migrate and follow the directions.

Basically, you can just click a button and backup your entire Drupal database with only the data you want to keep. The unnecessary data from the cache and accesslog tables aren't backed up so the file size of the backups is smaller than it would be if doing a mysqldump.

Here's a screenshot:

drupal-backup-and-migrate.png

There are also options to compress the file before download, or add a datestamp:

drupal-backup-settings.png

The Backup & Migrate modules is one of the best Drupal Modules. I can't imagine going back to the old method. It's great because even clients can backup their own websites. This module doesn't backup the files for you, but it's quick to backup the files over SSH, or just download over sFTP.

How to Fix Drupal's 'Create Content' Link Problem

Sometimes the Drupal menu will have a bug where the create content link will appear even for users that aren't logged in. Here is a screenshot of the problem:

Drupal's create content menu bug

Fixing the problem should just involve going to your menu settings page (http://example.com/admin/build/menu) and clicking on the reset link next to the create content link as shown in the image below:

Fixing the Drupal create content menu problem

That will reset that menu item and the create content link should then be invisible to anonymous users.

How to Make a Drupal Forum

Drupal's built-in forums are not ideal in all situations, but they make it easy to add a quick forum to your Drupal site.

The default Drupal forums are not very attractive, but they can be improved with a little CSS modification. Here are a couple of examples of great Drupal forums:

The Drupal forum tutorial

Go to administer —> modules and enable the forum module.

Go to administer —> forums and you will see a screen like this, although your theme may be different. In these first examples I am using the Sands theme.

Drupal forums default page

The first thing you will do is add containers. Containers are the categories that hold your forums. Containers are optional, but this tutorial shows how to use them. In this example I am going to make three sections of the forum: one section for Linux discussions, one for Windows, and one for Mac. This is the Add Container screen:

Adding a container to the forums

After you have made your containers, add your forums. In the following screenshot, I am making a forum section for Damn Small Linux. Notice how I am adding the Damn Small Linux forum to the Linux container with the Parent select box.

Adding a forum to a container

After you have added some containers and forums, you will have something that looks like the following screenshot:

The final Drupal forums

How to Style Your Drupal Forums

The look of the forums is controlled by CSS in your style.css file in your current theme directory. You can see your styles with the Firefox Web Developer Toolbar — an essential web design tool. On the Web Developer Toolbar menu, go to Information —> Display ID & Class Details and the CSS IDs and classes will become visible on the web page as shown below:

Web Developer Toolbar showing the CSS IDs and classes on the web page

Here is a closeup, in case the previous image is hard to read:

Closeup of the previous image

The container rows have the class .container and the forums rows have the class .forum. You can find those rules in the style.css file and make the changes to those CSS classes.

The Sands theme is a little harder to modify because the CSS is spread across 5 different files, so I'm going to finish the rest of the tutorial using the Bluemarine theme. Here is the relevant CSS from the Bluemarine style.css file with the .container class highlighted:

Bluemarine style sheet

Just make your modifications there and reload the browser to see the changes. Here is an example of some quick changes to the CSS:

CSS changes

And here are the simple results, showing more contrast between the containers and forums:

The new forum

A few more CSS modifications:

A few more CSS modifications

Here is the result of those final changes. Note how the container names and descriptions are on one line now instead of two lines:

More CSS modifications

Further Reading

If you want to take your Drupal forum a little further, try installing the Flatforum Module. There are some Drupal forum recipes and discussions here and here, as well as throughout the Drupal.org site (use the search function).

It is possible to import other forum systems into Drupal. Check the Drupal modules page for other forum-related projects such as Drupal vBulletin integration, FUDforum integration, Comment Mover, and more.

The reason that I tend to avoid integrating other forum systems into Drupal is that I want the forums to completely integrate with the rest of the site and to be searchable with the Drupal search form.

Drupal forums don't have as many features as other systems like vBulletin; their main advantage is that they are completely integrated into the rest of the CMS.

Drupal PHP Snippet: How to Check if a Visitor is Logged In

A common question when using Drupal is "How do I show something only to users that are not logged in?" (or vice versa). This is the simple answer for snippets of content:

<?php

// You must include this next line
global $user;

// if visitor is not logged in
if (!$user->uid) {
  // do something if not logged in
} else {
  // do something else if logged in
}

?>

I see many people searching this site for the answer to that. I hope someone finds it useful.

Drupal Pathauto Tutorial

Drupal's Pathauto Module is a great module for SEO. It automatically generates URL aliases for nodes based on highly-configurable rules.

This tutorial starts by introducing Drupal's URL aliases, and then moves on to configuring Pathauto. I'm writing this tutorial for Pathauto version 5.x-2.1, but the basic concepts should apply to other versions also.

First, download the Pathauto module and the Token Module. Both are needed to make Pathauto work. I also recommend installing the related Global Redirect module for reasons mentioned in my Drupal SEO tutorial.

Drupal URL Terminology

Drupal has internal URLs with structures like node/123, user/5, and taxonomy/term/1. If you build a Drupal site and don't use the Path or Pathauto modules, your URLs will look like example.com/node/123.

Drupal's built-in Path module lets you override those default URLs with URL aliases. A URL alias is just an alternate URL that will load an internal Drupal URL.

For example, if you have a node (page) at http://example.com/node/123, you can manually create a URL alias with the Path Module so that http://example.com/green-cars is used instead of http://example.com/node/123. (Side note: the reason for installing the Global Redirect Module above is to prevent duplicate content by creating redirects to the URL aliases.)

The Pathauto Module is a module that automatically generates URL aliases based on your custom Pathauto settings.

Pathauto Settings

After you install Pathauto, go to the Pathauto settings page at admin –> pathauto.

Your Pathauto settings page may look different, depending on which modules you have installed:

The settings page for the Drupal Pathauto Module

Click on General Settings to open that section. In this basic Pathauto tutorial, you can leave everything at the default setting except for this section:

General Pathauto Module settings

For Update action, be sure to choose "Do nothing. Leave the old alias intact." That will prevent your URLs from accidentally changing if you change the title of an already published node.

I have some sites with characters in foreign languages so I check the box that says "Transliterate prior to creating alias". That means that if you use non-English characters in your post titles, Pathauto will convert them to their equivalents instead of replacing them with a dash.

You won't be able to check the box until you make a i18n-ascii.txt file, which is as easy as renaming the one that comes in the Pathauto directory (just remove the word "sample" from the existing file's filename). There is a page on Drupal.org with information about customizing your transliteration file if you need something else from it.

I also check the box that says "Reduce strings to letters and numbers from ASCII-96" because search engine crawlers are not very smart and I like to keep things simple for them.

The only setting that you need to change in the Punctuation Settings section is the one for Hyphen. Change it to No action (do not replace) as shown below:

Pathauto hyphen

The only reason for that is to remove an error message that will otherwise appear.

Pathauto Rules

Each content type in Drupal can have a different set of Pathauto URL rules. The URL rules can be based on things like taxonomy term ("category"), node ID number, title of the node, date or time based, author name, etc.

Here is a sample list of available tokens that you can use in URLs that you can view by clicking on "Replacement patterns" on the Pathauto configuration page:

a list of available Pathauto tokens

You then enter the above-mentioned replacement patterns in your Pathauto rules for each content type:

Pathauto rules settings for Drupal

In the screenshot above I'm creating very "non-Drupal" URL paths. More common would be to leave out the [nid] and the .html file extension.

Pathauto Warnings

There is a page on Drupal.org about dangerous Pathauto patterns. You can avoid all of those potential problems by just making sure that you never use just [title-raw] as the only setting for Pathauto. There should always be more data in the URL than just the title otherwise people could authenticate your site in Google Webmaster Tools or create a URL alias that conflicts with an existing system path.

The thing you don't want is for someone to be able to create a node called "This is my post" and have the resulting URL be example.com/this-is-my-post. You should have a little more data in there.

Examples of extra data that you could add to the Pathauto rules are:

Further Assistance

Drupal's Pathauto Module can be difficult to configure, but it is one of the modules that makes Drupal a great CMS. It's often a good idea to test your Pathauto rules on a sandbox site before putting them on a live site until you figure out patterns that you like.

If you have questions about configuring Pathauto, please leave a comment below.

Drupal SEO

I've decided to organize a section of this Web site around posts related to Drupal SEO. I'll be adding a couple of other tutorials in the next week or two. In the meantime, checkout the previous Drupal search engine optimization articles below.

If you are completely new to search engine optimization, read and bookmark the intro to SEO page, and check out SEO Elite Software.

I also offer Drupal SEO consulting services.

Drupal SEO Consulting Services

I offer two Drupal-related SEO services:

  1. SEO Site Audits
  2. SEO Campaigns

I've written some of the most comprehensive Drupal SEO tutorials on the Web, including the Drupalzilla robots.txt tutorial and the basic Drupal SEO tutorial that is at the top of Google for drupal seo right after Drupal.org.

SEO Site Audits

An SEO Site Audit is a detailed analysis of your site's configuration and structure, and it contains recommendations on optimizing your SEO with Drupal-specific tips.

SEO site audits are delivered in PDF format, with 2 hour of consulting beyond the delivery of the site audit. A typical site audit is between 35 to 50 pages in length. SEO site audits are done at a flat rate of $2000 USD.

SEO Campaigns

An SEO Campaign is a longer consulting agreement of 6 or more months where I work with your Web developers to systematically increase traffic through comprehensive search engine and social media optimization techniques.

For more information, please inquire through the form below:

SEO and SMO traffic

SEO Services Inquiry

Please enter the name of your company or business
Please enter the URL of the website that you are inquring about
Please enter your message here

Basic Drupal SEO: On-site Optimization

NEW! This Drupal SEO tutorial has been updated and rewritten in May 2008.

Drupal is a great open source GPL content management system. With a few modifications it can be configured for excellent on-site search engine optimization. This tutorial only covers the very basics of on-site optimization. It will make sure that search engines are able to spider your site, and prevent some common Drupal SEO errors.

This is just a basic introduction to configuring a Drupal site for good search engine rankings. Other tutorials will go into more depth.

Summary

  1. Enable Clean URLs
  2. Enable Path Module and install and enable Pathauto, Global Redirect and Token Modules.
  3. Configure the Pathauto Module
  4. Install and enable the Meta Tags Module.
  5. Install enable the Page Title Module
  6. Do NOT install the Drupal Sitemap Module.
  7. Fix .htaccess to redirect to "www" or remove the "www" subdomain.
  8. Fix your theme's HTML headers if they aren't right
  9. Recommended: create a custom front page
  10. Modify your robots.txt file.

Drupal and Clean URLs

Enable clean URLs

Search engines prefer clean URLs. In Drupal 6, clean URLs should be automatically enabled if your server allows it. In Drupal 5 you can enable clean URLs under administer —> settings —> Clean URLs. Clean URLs are necessary for the pathauto module, mentioned below.

Drupal Modules for SEO

Install the pathauto module and enable it

The pathauto module is highly recommended. Pathauto will automatically make nice customized URLs based on things like title, taxonomy, content type, and username. You also have to enable the path module for pathauto to work.

Think carefully about how you want your URLs to look. It takes some experience with Drupal to get the exact URL paths that you might want. The URLs are controlled by a combination of taxonomy and pathauto, and I hope to cover that in another tutorial. You can also use the path module to write custom URLs for each page, but that might become tedious and inconsistent on a large site.

At the very least, enable the path module and install the pathauto module. It will generate nice-looking URLs for you without much configuration.

Caution: The above advice is directed towards new Drupal sites. If you have an existing Drupal site be very careful that you don't rename your previously existing URLs with the pathauto module. It is generally a very bad idea to change existing URLs because the search engines will no longer be able to find those pages.

Here are some pathauto settings to watch out for:

For update action choose "Do nothing. Leave the old alias intact." Otherwise the URLs of nodes will change every time you change the title of your post, causing problems with search engines:

Drupal Pathauto update action settings

There is also a more comprehensive Pathauto tutorial.

Install the Global Redirect Module

The Global Redirect Module will automatically do 301 redirects to your URL aliases. So if you have a node a example.com/node/5, the Global Redirect Module will redirect that URL to your alias at example.com/my-page.

Read more about the Global Redirect Module.

Install the Meta Tags (Nodewords) Module

The Meta Tags Module (formerly called "Nodewords Module") can be highly beneficial to your site. There is a myth in some search engine optimization circles that says, "meta tags are not important". This is not true.

Meta tags are not meant to be used for keyword stuffing. Don't use them for that purpose because it isn't going to help you. The really important meta tag is the meta description.

The meta description should be different on every page for best results. The meta description should be one or two brief sentences to summarize the page. It should be written for your human visitors, but it is not a bad idea to tastefully and sparingly insert a couple of your keywords. Often when a search engine lists your site in the search engine results pages, it will use your page's HTML title for the title, and your meta description for the text snippet. That is why the meta description should be written with human visitors in mind. You want a text snippet that is going to make them want to click on the link.

Here is one textbook example from this site in the Google SERPs:

Drupal meta description being used as a text snippet in Google

I generally configure the Drupal Nodewords module to output the meta description and meta keywords on every page. I have a few default keywords set, and add a couple more on every post to make a unique combination of relevant keywords. I don't spend much time with it because I don't think the meta keywords are that important.

On the nodewords module's administration page, be sure to check the box that says "Use the teaser of the page if the meta description is not set?". That way each page will get a unique meta description even if you have denied access to create custom meta tags for nodes to some users.

Install the Page Title Module

The Page Title Module allows you to set custom page titles on every page. Highly recommended.

Google Sitemaps Module

Google Sitemaps are not essential, but I've been adding them to my Drupal sites. I think that Google Sitemaps were created by Google primarily for debugging Googlebot and not for the benefit of search engine optimizers.

There is a Drupal Sitemap Module, but the last time I checked it had serious bugs that made it unusable. In any case, I don't think that most Web sites need XML sitemaps. Other SEOs have similar opinions about sitemaps.

I recommend not using the Drupal Sitemaps Module.

Drupal Rewrite Rules

Make sure that your site does a permanent (301) redirect in either of the following two ways:

  • http://example.com to http://www.example.com, or
  • http://www.example.com to http://example.com

You can setup this redirect in your .htaccess file.

To remove the www from your site, look for the following code in your .htaccess file and uncomment and adapt:

  # To redirect all users to access the site WITHOUT the 'www.' prefix,
  # (http://www.example.com/... will be redirected to http://example.com/...)
  # uncomment and adapt the following:
  # RewriteCond %{HTTP_HOST} ^www\.example\.com$ [NC]
  # RewriteRule ^(.*)$ http://example.com/$1 [L,R=301]

To redirect to the www version of the site, look for the following code and uncomment and adapt:

  # To redirect all users to access the site WITH the 'www.' prefix,
  # (http://example.com/... will be redirected to http://www.example.com/...)
  # adapt and uncomment the following:
  # RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
  # RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]
  

Be sure to replace example.com with your domain name, and then test the redirects in a browser.

Fix Your HTML Headers

There should be one <h1> header element on every page and it should have your keywords in it.

  1. Enclose your site name in DIV tags, not HTML header tags.
  2. I would add one H1 element to the home page.
  3. On teaser views, the node titles should be enclosed in H2 tags, while the main header of the page (e.g., taxonomy term name) should be enclosed in H1 tags.
  4. On node view pages, the node title should be enclosed in H1 tags.

Duplicate Content from /node

By default, the front page of a Drupal site has nearly identical content to the page at /node. Search engines are going to spider and index /node because on the paginated home page view, the link to the first page in the series points at /node.

The fix for this is simple — always use a custom front page when building a Drupal site.

Drupal PHP Session IDs

I haven't seen this problem on Drupal sites in a long time, but if you see PHP session IDs in your URLs, it is very bad for search engines. They have to be removed if you want search engines to be able to spider your site well. A PHP session ID in your URL might look something like this: ?PHPSESSID=37765439acbd6c12345ee987776e65be.

From what I understand, this is the fix if your server supports mod_php — it goes in your .htaccess file:

# Fix PHP session ID problems in Drupal
php_value session.use_trans_sid 0
php_value session.use_only_cookies 1

Otherwise you can probably fix it my modifying your php.ini file (or creating one). I don't know the exact procedure for every host, only that your web site must not have PHP session IDs in the URLs if you want good spidering by search engines. Search Drupal.org or Google for how to turn off PHP session IDs on your server.

Drupal and Robots.txt

The default Drupal robots.txt file has critical errors in it even in Drupal 6.2 (bug report already filed).

Read this Drupal robots.txt tutorial for more information.

Watch out for contributed modules that create duplicate content through extra URLs. This can be a serious problem.

Further Reading

To learn more about search engine optimization, check out the SEO resources page.

Drupal and Canonical URLs

Google Engineer Matt Cutts talks about canonical home page URLs on his blog. The concept is basically this:

For the most part, search engines view different URLs as being entirely different pages. So the following URLs all may show the same content, but search engines will often see them a different pages with duplicate content:

  1. http://example.com
  2. http://www.example.com/
  3. http://example.com/index.php
  4. and so on...

Drupal does not link to its index.php file so the third URL example is generally not an issue with Drupal. However you should choose between using the www version of the domain name or the non-www version of the domain name. Drupal makes this easy by providing instructions in the default .htaccess file as shown below:

# adapt and uncomment the following:
# RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
# RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]
#
# To redirect all users to access the site WITHOUT the 'www.' prefix,
# (http://www.example.com/... will be redirected to http://example.com/...)
# adapt and uncomment the following:
# RewriteCond %{HTTP_HOST} ^www\.example\.com$ [NC]
# RewriteRule ^(.*)$ http://example.com/$1 [L,R=301]

When setting up your Drupal site you should decide whether you want your site to have the www subdomain or not and choose one of the two options in the .htaccess file.

For SEO purposes it doesn't matter either way unless your site has already been live for a while. If Google has indexed your site and shows the PageRank value of the home page (as seen in the Google Toolbar or through a Firefox Extension like Search Status), then Google has already chosen one version or the other for your domain name. In that case I would redirect to the version of the domain that Google has already accepted. You can determine which version Google has chosen by typing your domain name into Google without the www like this: example.com

Google should show your domain name at the top of the SERPs. If Google shows your home page with the www then you should redirect your site to the www-version. If they leave off the www then redirect to the version without the www.

Some people would say that it doesn't matter which one you redirect to even if Google has already indexed the site. But sometimes when you 301 redirect pages or sites, Google drops the original URL and it takes a while to get it ranked again. That is why I recommend going with the choice that Google has already made for you.

Drupal Modules SEO

The old Drupalzilla.com site had a database of Drupal modules with tips for SEO. I've copied some of the information into the pages below.

Abuse Module

The Abuse Module allows users to flag content as spam. It outputs an extra link at the bottom of teasers and full nodes.

Drupal's Abuse Module

The image above shows the link that is created at the bottom of nodes and comments that allows users to flag content for review by the moderators. The URLs that are linked-to have the structure http://example.com/abuse/report/comment/347. If you have a node with 15 comments, the Abuse Module will create 16 extra pages on the site, 15 abuse form pages for the comments and one for the node.

To fix this problem, the following rule should be added to your robots.txt file when using the Abuse Module:

Disallow: /abuse/

Forum Module

The Forum Module is part of Drupal core. If you enable the Forum Module, there are some things to be aware of.

Whenever Web sites create tables that are sortable by column headers, you are looking at potential duplicate content.

Drupal's Forum Module

The image above shows table headers in Drupal's Forum Module. When you click on one of those links, it re-sorts the data in the table appending parameters to the original URL.

In the example image above, the original URL structure is http://example.com/forums/introduce-yourself. Drupal's Forum Module creates the following additional URLs in the header links:

Link Text URL
Title http://example.com/forums/introduce-yourself?sort=asc&order=Topic
Replies http://example.com/forums/introduce-yourself?sort=asc&order=Replies
Created http://example.com/forums/introduce-yourself?sort=asc&order=Created
Last Reply http://example.com/forums/introduce-yourself?sort=desc&order=Last+reply

After visiting those pages you (and spiders) will also find the following URLs:

Link Text URL
Title http://example.com/forums/introduce-yourself?sort=desc&order=Topic
Replies http://example.com/forums/introduce-yourself?sort=desc&order=Replies
Created http://example.com/forums/introduce-yourself?sort=desc&order=Created
Last Reply http://example.com/forums/introduce-yourself?sort=asc&order=Last+reply

Pagination of the forums makes it even worse because each page can then be sorted in these 8 ways. Here is one example from Drupal.org: http://drupal.org/forum/2?sort=asc&order=Last+reply&page=393.

SEO Recommendation for the Forum Module

The recommended fix for this problem is to add the following line to the robots.txt file:

Disallow: /*sort=

SEO Recommendations for Drupal Core

The following line should be added to the default Drupal robots.txt file because the Forum Module is distributed with Drupal:

Disallow: /*sort=

Forward Module

The Forward Module adds a link to each teaser and full node that allows users to email the node to people.

Drupal's Forward Module

The image above shows the link that is created on every node. The URL structure of the link is http://example.com/forward/343. If your site has 3000 nodes, the Forward Module will create 3000 extra pages with nothing but a form that allows people to email the nodes to their friends.

To fix the issue, add the following line to your robots.txt file:

Disallow: /forward/

Global Redirect Module

The Global Redirect module has three main features:

If a requested URL has a URL alias, Global Redirect will do a 301 redirect to the URL alias.
For example, if you have a URL alias for node 25 called page-title, the Global Redirect Module will do a 301 redirect from http://example.com/node/25 to http://example.com/page-title.
It will remove trailing slashes from URLs.
For example, the Global Redirect Module will redirect a request for http://example.com/page-title/ to http://example.com/page-title. If search engines spider both versions, they will see two different URLs with duplicate content.
If a requested URL is being used as Drupal's front page, it will 301 redirect to the actual front page.
For example, if you are using the path frontpage as your site's front page, a request for http://example.com/frontpage will 301 redirect to http://example.com/.

If you search around the Web for Drupal SEO tutorials, many people recommend using mod_rewrite rules in an .htaccess file to deal with issues like removing trailing slashes. But on sites that also have non-Drupal content, you may have URLs that do have trailing slashes.

A slash is the symbol for a directory. For example, in the URL http://example.com/ the trailing slash is the symbol for the root directory of example.com. If you leave the trailing slash off, the server will add it. If you request a physical directory on a Drupal site (or any site) like http://example.com/modules the server will correct you by appending the trailing slash: http://example.com/modules/. If you have non-Drupal content on your server—perhaps a WordPress blog at http://example.com/software/—you will have URLs with trailing slashes. The WordPress blog would not be located at http://example.com/software, it would be located at http://example.com/software/. You would not want to remove trailing slashes from all URLs.

That is why the Global Redirect module is a good option. It will only remove trailing slashes from URLs that are handled by Drupal.

Image Module

The Image Module allows users to upload images as nodes.

It creates duplicate content on sites—at least two duplicate URLs for every image node created.

Drupal's Image Module screenshot

In the image above, the link to "Thumbnail" appends the query string ?size=thumbnail to the URL and redisplays the content. Once you are on the thumbnail page, a link to the preview page will be displayed. If you have allowed anonymous users to "View Original Image" in the Access Control settings, then there will be an additional link to the original image.

The URLs of duplicate content that a default installation of the Image Module are shown below:

  • http://example.com/node-title — this is the actual node's URL
  • http://example.com/node-title?size=thumbnail
  • http://example.com/node-title?size=preview
  • http://example.com/node-title?size=_original
    • The names of the image sizes are controlled through the Image Module settings at http://example.com/admin/settings/image:

      Drupal's Image Module settings

      So, for example, if you created an additional image size called tiny, the Image Module would then create an extra page of duplicate content for each image node on the site by appending ?size=tiny to the original URLs of the nodes.

      To fix this issue, add the following line to your robots.txt file:

      Disallow: /*size=

Paging Module

Drupal's Paging Module is a popular way to break up nodes across multiple pages. This module does create some problematic SEO issues though.

Drupal's Paging Module

As shown in the image above, the Paging Module is able to break up each node into multiple pages which creates more URLs. For example, if the page above had the URL http://example.com/page-title, the following other URLs would be created for the paginated views:

  • The number 2 would link to http://example.com/page-title?page=0,1.
  • The number 3 would link to http://example.com/page-title?page=0,2
  • And when you are on either of those two sub-pages, the number 1 would link back to the first page as http://example.com/page-title?page=0,0 instead of its original URL at http://example.com/page-title.

That results in a single page of content with two different URLs: http://example.com/page-title?page=0,0 contains duplicate content of the node's main URL http://example.com/page-title.

Temporary SEO Fix

The current SEO fix is to add the following line to your robots.txt file to prevent the duplicate pages from being indexed:

Disallow: /*?page=0,0$

The syntax in the above robots.txt rule is recognized by Google Search, Yahoo Search, and MSN Live Search.

Module development recommendations

Future versions of this module should be built so that the main URL is not duplicated. The link back to the main node page should not have a query string. Also, it would be best if the URLs that it generates were not dynamic.

The following example shows a possible URL structure for this module that would be better for search engine indexing:

  • Main node URL: http://example.com/page-title
  • First pagination: http://example.com/page-title/1
  • Second pagination: http://example.com/page-title/2
  • and so on...

Path Module

The Path Module is a Drupal core module. Enabling it allows you to create URL aliases.

Drupal's standard URLs (once you have enabled "clean URLS" in the Admin panel) are in this format:
http://example.com/node/25

Once you have enabled the Path Module you will be able to create URL aliases for each URL. If you created a URL alias for that URL (node 25) called custom-page-title, you would then be able to access the content of node 25 at http://example.com/custom-page-title.

You would also still be able to access the content of node 25 at http://example.com/node/25. Generally, you do not have to worry about this unless your site has already been indexed with the original "node" URLs. In either case you could install the Global Redirect Module which would automatically redirect http://example.com/node/25 to your URL alias at http://example.com/custom-page-title.

A related module is the recommended Pathauto Module which automatically creates URL aliases for each node on your site.

Spam Module

The Spam Module filters content and comments for spam, as well as lets users flag contents for review by the administrators.

The Drupal Spam Module creates URLs on the site like:
http://example.com/spam/report/comment/1

To prevent low-quality pages from being indexed, add the following line to your robots.txt file when using the Spam Module:

Disallow: /spam/

Tracker Module

The Tracker Module creates "track" pages for each user.

For example, a page that tracks user #234 would have a tracker page located at http://example.com/user/234/track. Those pages should be blocked from search engines with the following rule:

Disallow: /*/track$

That robots.txt syntax is recognized by Google Search, Yahoo Search, and MSN Live Search.

The tracker module also keeps track of recent posts on the site at URLs like http://example.com/tracker and on large sites creates thousands of tracker pages like http://drupal.org/tracker?page=6379.

My recommendation is to leave the first page of the Recent Posts (http://example.com/tracker) exposed to search engines, while blocking the paginated tracker pages like http://example.com/tracker?page=50. Leaving the just the first page of /tracker exposed to search engines allows search engines to rapidly find and index your latest content as it is posted.

The following rule blocks all but the first of your site-wide tracker pages:

Disallow: /tracker?

Drupal SEO and Case Sensitive URLs

Search engines like Google and Yahoo are based on Unix (Linux or BSD). Unlike on Windows, filenames on Unix servers are case-sensitive. That means a file called INDEX.HTML is a different file than index.html.

Drupal has an SEO issue where URLs are not case sensitive. I'll explain why this is a problem.

Here's an example of case-sensitive URLs on a Unix server—Google.com:

google-com.png

One page with more than one URL can be seen as duplicate content in the eyes of search engines. A page that shows the same content regardless of the letter case of the URLs, is showing duplicate content.

Here's an experiment I did with Drupal and case sensitive URLs. It shows that both versions are indexed by Google as duplicate content:

google-case-sensitive-URLs-drupal.png

I posted an issue here. I think it's a MySQL problem. Here's the code from the Drupal 5.7 Path Module:

case 'load':
$path = "node/$node->nid";
// We don't use drupal_get_path_alias() to avoid custom rewrite functions.
// We only care about exact aliases.
$result = db_query("SELECT dst FROM {url_alias} WHERE src = '%s'", $path);
if (db_num_rows($result)) {
$node->path = db_result($result);
}
break;

Here's what MySQL.com says:

The default character set and collation are latin1 and latin1_swedish_ci, so non-binary string comparisons are case insensitive by default. This means that if you search with col_name LIKE 'a%', you get all column values that start with A or a. To make this search case sensitive, make sure that one of the operands has a case sensitive or binary collation.

I think that it's something that needs to be fixed in the Path Module and/or added to the Global Redirect module.

A Drupal site isn't going to be affected by this naturally. It would only happen if someone working on the site manually links to URLs in a different case than the case of the Drupal URL aliases. I wouldn't call it a "critical" issue, but it definitely should be fixed as soon as possible. Theoretically it could be used to maliciously affect a site's rankings.

Comments and opinions welcome.

Drupal SEO: "404 Ok" and .htaccess

NOTE: This tutorial is no longer current. Please see the Drupal SEO Tutorial for current information on Drupal 5 and Drupal 6.

There are two problems in Drupal 4.7 that may cause problems with search engine spiders.

Drupal .htaccess: Redirecting to www

Tip: .htaccess is only used with Drupal on Apache server. If you are using Windows and want to install Apache, try Apache2triad which includes Apache, PHP, MySQL, Perl, Python, and much more. Apache2triad installs with a double-click. You can run Drupal on IIS, but I don't think it's a good idea.

If you don't know what URL canonicalization is, read this first.

The default .htaccess in Drupal 4.7 has some lines that you can uncomment to redirect your visitors in one of the following two ways:

  1. http://example.com to http://www.example.com
  2. http://www.example.com to http://example.com

This is the relevant section of the default Drupal .htaccess file — it is a bad idea to use this code on your site:

  RewriteEngine on

  # If your site can be accessed both with and without the prefix www.
  # you can use one of the following settings to force user to use only one option:
  #
  # If you want the site to be accessed WITH the www. only, adapt and uncomment the following:
  # RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC]
  # RewriteRule .* http://www.example.com/ [L,R=301]
  #
  # If you want the site to be accessed only WITHOUT the www. , adapt and uncomment the following:
  # RewriteCond %{HTTP_HOST} !^example\.com$ [NC]
  # RewriteRule .* http://example.com/ [L,R=301]

It is a bad idea to use these default RewriteRules because they will only redirect to the Drupal home page. For example, they will redirect a request for http://example.com/MyPage to http://www.example.com/, when it should redirect to http://www.example.com/MyPage. A site should redirect to the requested page, not back to the home page.

This default Drupal .htaccess file is dangerous because external web sites might link to a page on your site like http://example.com/MyBestPage and if you use the default Drupal RewriteRules it will redirect the search engines (and visitors) to http://www.example.com/ — the "www" version of your home page; not the intended page. Don't risk confusing the search engines with 301 (permanent) redirects to your home page when you don't intend for them to go to your home page.

To fix this problem, use the following lines in your Drupal .htaccess file instead, right after the line that says RewriteEngine On, replacing example.com with your domain name:

  RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC]
  RewriteRule (.*) http://www.example.com/$1 [R=301,L]

If you prefer to remove the www then use the following rule instead:

  RewriteCond %{HTTP_HOST} !^example\.com$ [NC]
  RewriteRule (.*) http://example.com/$1 [R=301,L]

Tip: If you want to know the details on how those rewrite rules work, check out this mod_rewrite cheat sheet.

Drupal's 404 Ok Error

Drupal has a problem when you are running PHP as CGI. Instead of sending "404 Not Found" errors when it can't find a page, it will send "404 Ok" errors. You can read more about it on PHP.net. When a search engine spider requests a page that doesn't exist, you want to send a proper "404 Not Found" header.

To see if you are sending faulty "404 Ok" headers, you can use Firefox with the LiveHTTPheaders extension. After you have installed that extension and restarted Firefox, go to Tools —> Live HTTP headers. That will open up the header-viewer window. Then go to your web site to a page that doesn't exist (like http://www.example.com/asdf1234). Check the LiveHTTPheader window to see if it sends a correct "404 Not Found" header or an incorrect "404 Ok" header. If it says "404 Ok", then there is a problem and you can fix it as explained below.

To fix the Drupal 404 error problem open up the file /includes/common.inc. In Drupal 4.7.3, it is about line 288 where you will find this:

  drupal_set_header('HTTP/1.0 404 Not Found');

Change that line to:

  drupal_set_header('Status: 404 Not Found');

Then check your headers again in Firefox with LiveHTTPheaders. If it says "404 Not Found" then your problem is solved. If it doesn't work, leave a comment below and let me know...

Update: PHP "301 OK" Header Errors

As mentioned below in the comments there is also a "403 OK" error that can exist on some configurations. For an example on how to fix the similar "301 OK" PHP header error, see my post on PHP redirects.

Drupal.org Bug Report

See also the Drupal bug report page for this problem.

Note: The HTTP 1.0 specifications say that "the Status-Code is intended for use by automata and the Reason-Phrase is intended for the human user. The client is not required to examine or display the Reason-Phrase." But — I did have a problem where Google would not remove some of my pages even with the manual removal tool until I fixed the headers from "404 OK" to "404 Not Found". I'm not sure what the current status on this issue is, but to be safe I recommend sending a correct "404 Not Found" header. Not all bots may be programmed according the standards.

Robots.txt and Drupal

An important aspect of Drupal SEO is the robots.txt file. Drupal 5 was the first version of Drupal that came with a robots.txt file, but it still needs some modifications.

One of the most serious SEO problems with Drupal is duplicate content. With the addition of contributed modules it can get so bad that one might refer to it as druplicate content. (ow...)

A key element of SEO on sites is getting a good, clean crawl. A robots.txt file is important for a clean crawl because it tells robots where they aren't supposed to go. There are many places on a Drupal site that search engine crawlers shouldn't go.

Drupal's Default Robots.txt File

I've attached Drupal 5's default robots.txt file for reference and will address it in sections:

Crawl Delay

The first thing I would do is remove the Crawl-delay line. Unless you have a very large site or spidering problems, it's not needed. The other robots.txt rules that I mention here should help cut down on the number of pages crawled.

User-agent: *
Crawl-delay: 10

Directories

The next section of the default robots.txt file addresses the physical directories created by Drupal:

# Directories
Disallow: /database/
Disallow: /includes/
Disallow: /misc/
Disallow: /modules/
Disallow: /sites/
Disallow: /themes/
Disallow: /scripts/
Disallow: /updates/
Disallow: /profiles/

That section can be left as-is. Just keep in mind that it will probably keep search engines out of your logo and image files also because you are blocking your /sites/, /modules/, and /themes/ directories. If you use an alternate logo image, rename it so that it includes a keyword and place it in your /files/ directory.

Files

The next section addresses files that are included with Drupal. I've never seen any of these files indexed, but you can leave this section in if you wish. Don't delete your CHANGELOG.txt file as some people recommend, because it lets you know what version of Drupal you are running in case you forget later.

# Files
Disallow: /xmlrpc.php
Disallow: /cron.php
Disallow: /update.php
Disallow: /install.php
Disallow: /INSTALL.txt
Disallow: /INSTALL.mysql.txt
Disallow: /INSTALL.pgsql.txt
Disallow: /CHANGELOG.txt
Disallow: /MAINTAINERS.txt
Disallow: /LICENSE.txt
Disallow: /UPGRADE.txt

Paths

This is the most important section of the default robots.txt file because it contains some errors:

# Paths (clean URLs)
Disallow: /admin/
Disallow: /aggregator/
Disallow: /comment/reply/
Disallow: /contact/
Disallow: /logout/
Disallow: /node/add/
Disallow: /search/
Disallow: /user/register/
Disallow: /user/password/
Disallow: /user/login/

Drupal doesn't have trailing slashes on the URLs, so you may want to remove trailing slashes from some of the rules as shown below:

Disallow: /admin/
Disallow: /aggregator
Disallow: /comment/reply/
Disallow: /contact
Disallow: /logout
Disallow: /node/add
Disallow: /search/
Disallow: /user/register
Disallow: /user/password
Disallow: /user/login

For example, each "Login or register to post comments" link on each node creates URLs like http://example.com/user/login?destination=comment/reply/806%2523comment_form and http://example.com/user/register?destination=comment/reply/806%2523comment_form. Drupal's default robots.txt rules will not block search engines from spidering those URLs, but if you remove the trailing slashes as I've mentioned above, it will.

The Aggregator Module creates URLs of duplicate content like http://example.com/aggregator?page=3 that are not blocked by the default robots.txt file. Removing the trailing slash on the end of "/aggregator/" in the default robots.txt file will solve that problem.

Paths (no clean URLs)

The next section of the robots.txt file addresses paths that should be blocked if you aren't using clean URLs:

# Paths (no clean URLs)
Disallow: /?q=admin/
Disallow: /?q=aggregator/
Disallow: /?q=comment/reply/
Disallow: /?q=contact/
Disallow: /?q=logout/
Disallow: /?q=node/add/
Disallow: /?q=search/
Disallow: /?q=user/password/
Disallow: /?q=user/register/
Disallow: /?q=user/login/

UPDATE: Please ignore the following lines. Further testing has shown that this rule will block all dynamic URLs in Google. So don't use it!

Most of the people reading these Drupal SEO tutorials are using clean URLs. If you are using clean URLs you can delete that section and replace it with the following line:

Disallow: /?

That line would block all of the URLs that start with ?q= as well as other miscellaneous query strings that might later appear for various reasons.

If you are not using clean URLs, modify the above section using the same logic as for the "clean paths" section above it. If your site has been indexed without clean URLS—for example, the page http://example.com/?q=node/25 has PageRank and you are going to implement clean URLs—you should use .htaccess to do 301 redirects from the dynamic versions of the URLs to the clean ones. In that case do not block the dynamic URLs from search engines because you would want them to transfer the PageRank of the dynamic URLs to the clean URLs. If that issue applies to you and my explanation doesn't make sense, please let me know in a comment below and I'll try to explain it another way.

Additional Rules

I also recommend adding the following rules, after carefully reading and understanding the explanations given with them:

Each module potentially adds many extra URLs on the site which often create massive amounts of duplicate content and that also increase