ColdFusion Muse

Search Engine Safe URLs and Semantic Parameters

Mark Kruger April 22, 2008 11:41 AM Hosting and Networking Comments (4)

This topic crops up frequently in our line of work. Among the items that are often listed as important to search engines are "search engine safe" (SES) URLs. It has been pointed out that Google will index just about anything - including obscure looking URLs with cryptic parameters on them. Although this is true, we shall see that it does not exempt the developer from paying attention to the URL when he or she is thinking about search engine optimization. Let me explain.

Recently on CF-Talk Rizal Firmansyah of MasRizal & Partners (a consulting firm) gave this excellent example. He did a Google search for CFX_Excel (a noted and useful CFX tag for working with excel documents). You can see the results yourself at this link. Rizal pointed out that the top ranked link in the results pointed to a URL on his own corporate site (they apparently own this tag) and it included URL parameters. It looks like this:

index.cfm?fuseaction=idea.download_detail&ProductID=cfx_excel

Immediately underneath the top choice was a link to cftagstore that looked like this

/tags/cfxexcel.cfm

Obviously the url with parameters managed to rank ahead of the url without parameters. Of course there are many reasons why this could be the case. Other examples on the page appeared both with and without URL parameters. Rizal's point was that his site was listed first in spite of the fact that it had a hefty URL parameter payload. Doesn't this prove that SES URLs are irrelevant? The answer to that is (in the words of the elves to Biblo) both yea and nay...

Semantics and URL Parameters

The thing to note here is that just because links with parameters are accepted and indexed does not mean you should ignore how they are constructed. Whatever is in that link is recognized as having some semantic significance to search engines. In Rizal's example the top choice was a page devoted to CFX_Excel (so the content was on point) and the URL specified some important details that the search engine probably picked up. I'm just making an educated guess, but I would wager that the word "download" and the value of productID being "cfx_excel" were both favorable indicators for the search engine. If, like a lot of ecommerce sites, the URL was "ProductID=049302", I think that it would have made some small difference. Google may still have ranked it first, but it would be one less favorable indicator.

SES URLs With Parameters

My point is that, if you are concerned about search engines, you should pay attention to the construction of your URLs. Go ahead and use parameters, but don't be afraid to include some useful information on them. If you have a URL that looks like "...cfm?cityid=342", take the time to make it look like "...cityid=342&city=Chicago". Sure, you don't "need" anything more than the cityID on the url, but the additional information makes sense to the search engine and gives you one more arrow in your quiver.

  • Share:

4 Comments

  • rish's Gravatar
    Posted By
    rish | 4/22/08 10:13 AM
    It's interesting how this topic surfaces, drops off, and resurfaces from time to time.
    It's _important_ to have a very controlled policy about content that's passed on the URL. I have found time and time again, how prototyped applications foolishly go production with dangerous URL capabilities - such as record deletes.

    Interesting citation: http://news.ycombinator.com/item?id=165896
  • CheyenneJack's Gravatar
    Posted By
    CheyenneJack | 4/22/08 10:18 AM
    This is a good article and a great point. While yes some sites will index highly despite not using a particular technique, one has to realize that the black magic of SEO is based upon many techniques.

    One paradigm I try to use when developing our SEO sites is treat each page like an entry in a library catalog. The title and the URL being the two quickest ways people (aka search engine) bots may try to understand what the page is about.

    Another thing to consider is that as pages evolve, a person should be able to almost guess where typical pages of your site are. For instance the blog at my site "should" be at www.cheyennejack.com/blog other pages like /about /faq and others could also become standard areas people tend to start memorizing.

    Finally, URL rewriting can help a lot of folks out there, especially if you are using front controlled frameworks like model-glue or fusebox. Utilizing a tool like ISAPI Rewrite, can give you easy to find, cruft-free URLs while still programming in your framework and keeping your files where you want them.
  • Carl Dickson's Gravatar
    Posted By
    Carl Dickson | 4/23/08 2:20 PM
    Where I have run into problems with parameters in the URLs is when you use them for information that is not unique or that point to the same page. For example, I use URL parameters that tell me which ad or site sent a visitor. Google (and other engines) has trouble understanding that the landing page is the same --- it sees two or more URLs and doesn't know whether its one or more pages or which URL is preferred. If something like this is implemented poorly, it can actually send the spdiers into an infinite loop or cause them to hit the same page over and over with different parameters. Sometimes the search engine will show the wrong page in the results and it can dilutie page rank and possibly get a duplicate content penalty. Another problem is that when people bookmark the page or link to a page, they often use different versions of the URL. Google sees them as links to different pages.

    One trick I have tried is to create search engine friendly file names and put the parameters as variables in each file and then include the actual page as a template. It's a pain to have to change a bunch of files to implement a change, but it should be rare that you change the parameters (as opposed to the template). I'm running an experiment like this, but it seems like it can take months for some changes to show up in the results.

    Another tip for SEO --- instead of id numbers, use relevant keywords if you can. The search engines like to see keywords in the URLs, even if they are URL parameters.
  • Adam Haskell's Gravatar
    Posted By
    Adam Haskell | 4/24/08 7:46 AM
    @rish technically get HTTP requests should not cause state changes on a server ever, so that is not even following the semantics of HTTP. State changes should always be made by POST, or depending on your content you might even want to accept PUT or DELETE requests. In all honesty I don't think I follow it 100% of the time but I figured I'd throw it out there :)

    http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.h...