In this post, part 2 in our search engine series, we will discuss important aspects of coding and designing that will facilitate easier indexing by search engines and create a higher likelihood of a rising page rank. In Part 1 of this series we discussed the concept that your web site needs valuable and fresh content to really be useful to search engines. Without useful content your web site will not be a destination that anyone wants to visit, and therefore it will not be something that search engines (who are customer focused) want to index. Keep part 1 in mind as we discuss what you can do with your code and with your pages. Unless you have solved the puzzle of maintaining fresh and valuable content on your web site, you will be spinning your wheels.
Of course, you can use certain techniques to get yourself ranked high - at least temporarily. But my guess is that you will spend just as much time changing your code in a running battle to keep yourself on top of Google as you would if you learned to write and maintain good information on your site. By the way, those "black-hat techniques" are not discussed in this post. This post is about preparing your valuable content to be consumed by a search engine that wants and needs it - not about tricking a search engine into indexing less than worthy content. With that in mind....
You remember the header? It's that thing above the
tag? It has some important items in it that are relevant to search engines. Here's the header from part 1 of this series:Let me say at the outset that it is important to have a document declaration. Your browser attempts to use this declaration to render your page. The "DTD" for the document declaration determines "rules" for how the entities in your page are displayed. In our case we are using "4.01 transitional" with the "loose" dtd. That means the browser will not try to "strictly enforce" the rules. It will make it’s “best effort” to render the page. To see how this works, take an older page with lots of tables and inline styles and change the Document declaration to strict:
What does that have to do with search engines? The rule of thumb is that you should have a document declaration and that your page should conform to that declaration. Does that mean search engines will validate your HTML or XHTML to see if it meets the requirements? That is highly unlikely, and not just because it would be a vast undertaking. The truth is that almost no web sites validate correctly. As a test I ran some sites through the W3 Validator with some surprising results:
One of my developers, Jason Herbolsheimer, came back from MAX having attended a search engine workshop with Figleaf's own Steve Drucker. Jason's take on Steve's session was that the title is possibly the most important item on any given page for search engine ranking. In other words, every page should have a title and the title should reflect the content on the page. In this respect Ray's blog does a nice job for me by putting the title of the story into the title tag. It also puts it into a meta tag named "title", and the title (with slight variation) goes into the meta tag "description" as well.
Regarding those famously overused and popular meta tags of "keyword" and "description" - they are much less widely used than you might think. In fact, according to this article on meta tags in Wikipedia (last updated Nov 16, 2006), Google doesn't trust and doesn't use meta data for indexing although the article does say that some researchers have found a link between visibility and page ranking. My advice? Make sure you have the "description" and "keyword" meta tags, but don't put your faith in them. Instead, pay attention to the title tag!
The header above is nice and clean. It includes the things a header should include. You might notice what it does not include. It does not include a style block and it does not include excessive JavaScript. I have seen (and developed) many pages with a long style block in the header like this:
While we are on the subject of simple and clean, nothing is cleaner or simpler or easier to index than RSS. As search engines get more and more cluttered with the waste by-products of black-hat optimization, blogs and RSS are the "new" way to get to useful content. In my opinion, you should not consider building a traffic driven site without giving some thought to RSS as a supplement. Even if a search engine does not index your RSS you should remember that many sites are consuming RSS feeds. With RSS you increase the likelihood that some site will link to your site (as long as you have good content) - and that is definitely a marker that search engines take into account. If other sites link to yours, your page ranking will rise.
This little tag can be used to instruct a search engine spider what to do with your page. You have 3 indicators to work with, index, follow, and no.
It should be noted in regard to the meta "robot" tag and the "robots.txt" that more than one blog or webmaster resource I ran across put little faith in either of these methods. They indicated that it is their belief that spiders and agents simply ignore these instructions and index everything that they can get their hands on. One other note in regard to the "nofollow" tag, don't confuse the meta tag for robots or the robots.txt file with the rel="nofollow" that you sometimes see on a link. The rel="nofollow" attribute is to tell search engines not to include this link in its ranking. Primarily this is used to cut down on the efficacy of link spammers. See this Wikipedia article on Spam in Blogs for a good explanation of the ins and outs of this technique.
In our next post we will continue the discussion on coding techniques and delve into how your page is constructed.