Nov 26 2011

Improve your site’s SEO with Basic URL rewriting/redirects using mod_rewrite in .htaccess

Category: .htaccessUlrich Palha @ 12:19 am

Overview

If you host your own site with shared linux hosting using Apache, and you are interested in SEO, then there is good chance that you will  want to rewrite and/or redirect your URLs. One of the ways to do this is to use .htaccess.  I will cover a couple of the common examples that I have come across recently together with a detailed explanation of each

Keyword Rich URLs

Lets assume that you have a book application that you currently access via http://www.mysite.com/books/index.php?id=1590595610 and you want to transform it to a keyword rich URL format like http://www.bigsite.com/Definitive-Guide-Apache-mod_rewrite-Guides/ep/1590595610. This change is relatively easy and can be done by adding the code below to a file called .htaccess that you place in the root folder of your www.mysite.com domain.

RewriteEngine On
RewriteBase /

RewriteRule ^.*/ep/(.*) books/index.php?id=$1 [L]
  • Line 1:  enables the runtime rewriting engine, mod_rewrite, and should be close to the top of your .htaccess file and before any RewriteRule or RewriteCond directives.
  • Line 2:  sets the base URL for for rewrites. In this case it is set to the root of the site, but if you were editing the .htaccess in the blogs folder which corresponded to the /blogs in the url, then you would set the base to /blogs.  It should be explicitly set in every .htaccess file as the default is set to the physical directory path.
  • Line 4:  consists of 3 parts after the RewriteRule directive:
      1. The pattern to match, ^.*/ep/(.*). This pattern is a perl compatible regular expression that is applied on the URL  being requested. which will match any URL that has an /ep/ in it.
        Click to view some matching URLs.
        As you can see from the list of matching URLs, this pattern is too general, matching many URLs that you did not expect. See below for an improved pattern.  The parentheses at the end, (.*), creates what is called a backreference, which saves everything after the /ep/ and allow it to be used later.
      2. The substitution, books/index.php?id=$1, is a string that replaces the original URL that was matched and tell mod_rewrite what to do next.  In this case, it says to direct the original URL request to a file called index.php in the books folder. It also passes in a query string parameter called id=$1. The $1  is the backreference created in the pattern and, using the URL example above, will contain the value 1590595610
      3. Finally the flag, [L], is used to tell mod_rewrite that  if the pattern for this Rewrite rule matches, then it should stop processing further rules. Unless you have a reason to do otherwise, you should always use the [L] flag with every RewriteRule directive.

Click to view the improved pattern.

Reduce Duplicate Content

Now that you have keyword rich URLs like http://www.bigsite.com/Definitive-Guide-Apache-mod_rewrite-Guides/ep/1590595610, more people are linking to your site and you notice that many of the incoming URLs have additional parameters that you you did not put there. You also know that this causes search engines to treat these URLs as duplicate content.  You wonder how you can easily get rid of these extraneous query string parameters. Fortunately, mod_rewrite and .htaccess can also help you here.

Since your URLs do not need ANY query string parameters, the rule you would like to create is

If a request for your brand new keyword rich book URLs have any query string parameters then you want to 301 redirect the request to the same URL without any query string parameters

However, you cannot access query string parameters using the directives you know so far (RewriteRule etc.), so we will use a new directive called RewriteCond that allows you to access everything in the HTTP request including headers and query string values.

RewriteEngine On
RewriteBase /

RewriteCond %{REQUEST_URI} ^(/[-a-zA-Z0-9]+/ep/[0-9]{10,13})$  [NC]
RewriteCond %{QUERY_STRING} !^$
RewriteRule . %1? [L,R=301]

RewriteRule ^.*/ep/(.*) books/index.php?id=$1 [L]

Above is the updated .htaccess file that has 3 new lines, 4-6. Since you want to redirect requests with extraneous query string parameters BEFORE you serve them content, the new rules are placed before the rule to enable keyword rich URLs. In general you place rules that should execute first (and are generally more specific) higher up in the .htaccess file, while rules that are more general that you want to execute later are placed further down in the file.

  • Line 4: the first RewriteCond, examines the REQUEST_URI, which is everything in the URL after the domain name, including the leading slash, and before and not including the query string,
    Click to view some Request URI examples.
    and ensures that it matches our book product URL. Note the parentheses around the whole pattern capture a backreference to the entire REQUEST_URI that we will use later. The [NC] flag instructs mod_rewrite to be case-insensitive (NoCase).
  • Line 5: the second RewriteCond directive, examines the query string, and it will only match if it is not empty.
  • Line 6: if the conditions in line 4 and 5 are met, then the RewriteRule in line 6 will be tested (its actually the other way round, but its generally safe to think of it like this). The single dot is guaranteed to match any URI that is non-empty and will substitute a the backreference to the URI captured in Line 4. Note that a %, not a $ is used to refer to backreferences from a RewriteCond. The ? after the backreference tells mod_rewrite to discard any query string parameters. The new R=301 in the flag tell mod_rewrite to do a 301 redirect.

Summary

You can improve your site’s SEO with basic URL rewriting and redirects using mod_rewrite in .htaccess.  To find out how much more you can do with mod_rewrite and .htaccess take a a look at some of the links below.

References/Further Reading

Mastering Regular Expressions by Jeffrey Friedl – this is the book to read if you want to fully master regular expressions and understand how the various regex engines work.
The Definitive Guide to Apache mod_rewrite: this book is a great introduction to mod_rewrite with plenty of examples.
Apache Mod_rewrite and .htaccess documentation: Get the facts straight from the source.

Tags: , , ,