Wednesday, May 14, 2008

Search Engine Optimization

Introduction to SEO

This document is intended for webmasters and site owners who want to investigate the issues of seo (search engine optimization) and promotion of their resources. It is mainly aimed at beginners, although I hope that experienced webmasters will also find something new and interesting here. There are many articles on seo on the Internet and this text is an attempt to gather some of this information into a single consistent document.

Information presented in this text can be divided into several parts:
- Clear-cut seo recommendations, practical guidelines.
- Theoretical information that we think any seo specialist should know.
- Seo tips, observations, recommendations from experience, other seo sources, etc.

Common search engine principles
To understand seo you need to be aware of the architecture of search engines. They all contain the following main components:

Spider - a browser-like program that downloads web pages.

Crawler – a program that automatically follows all of the links on each web page.

Indexer - a program that analyzes web pages downloaded by the spider and the crawler.

Database– storage for downloaded and processed pages.

Results engine – extracts search results from the database.

Web server – a server that is responsible for interaction between the user and other search engine components.

Spider. This program downloads web pages just like a web browser. The difference is that a browser displays the information presented on each page (text, graphics, etc.) while a spider does not have any visual components and works directly with the underlying HTML code of the page. You may already know that there is an option in standard web browsers to view source HTML code.

Crawler. This program finds all links on each page. Its task is to determine where the spider should go either by evaluating the links or according to a predefined list of addresses. The crawler follows these links and tries to find documents not already known to the search engine.

Indexer. This component parses each page and analyzes the various elements, such as text, headers, structural or stylistic features, special HTML tags, etc.

Database. This is the storage area for the data that the search engine downloads and analyzes. Sometimes it is called the index of the search engine.

Results Engine. The results engine ranks pages. It determines which pages best match a user's query and in what order the pages should be listed. This is done according to the ranking algorithms of the search engine. It follows that page rank is a valuable and interesting property and any seo specialist is most interested in it when trying to improve his site search results. In this article, we will discuss the seo factors that influence page rank in some detail.

Web server. The search engine web server usually contains a HTML page with an input field where the user can specify the search query he or she is interested in. The web server is also responsible for displaying search results to the user in the form of an HTML page.

Internal ranking factors


Several factors influence the position of a site in the search results. They can be divided into external and internal ranking factors. Internal ranking factors are those that are controlled by seo aware website owners (text, layout, etc.) and will be described next.

Amount of text on a page
A page consisting of just a few sentences is less likely to get to the top of a search engine list. Search engines favor sites that have a high information content. Generally, you should try to increase the text content of your site in the interest of seo. The optimum page size is 500-3000 words (or 2000 to 20,000 characters).

Search engine visibility is increased as the amount of page text increases due to the increased likelihood of occasional and accidental search queries causing it to be listed. This factor sometimes results in a large number of visitors.

Number of keywords on a page
Keywords must be used at least three to four times in the page text. The upper limit depends on the overall page size – the larger the page, the more keyword repetitions can be made. Keyword phrases (word combinations consisting of several keywords) are worth a separate mention. The best seo results are observed when a keyword phrase is used several times in the text with all keywords in the phrase arranged in exactly the same order. In addition, all of the words from the phrase should be used separately several times in the remaining text. There should also be some difference (dispersion) in the number of entries for each of these repeated words.

Let us take an example. Suppose we optimize a page for the phrase "seo software” (one of our seo keywords for this site) It would be good to use the phrase “seo software” in the text 10 times, the word “seo” 7 times elsewhere in the text and the word “software” 5 times. The numbers here are for illustration only, but they show the general seo idea quite well.

Keyword density and seo
Keyword page density is a measure of the relative frequency of the word in the text expressed as a percentage. For example, if a specific word is used 5 times on a page containing 100 words, the keyword density is 5%. If the density of a keyword is too low, the search engine will not pay much attention to it. If the density is too high, the search engine may activate its spam filter. If this happens, the page will be penalized and its position in search listings will be deliberately lowered.

The optimum value for keyword density is 5-7%. In the case of keyword phrases, you should calculate the total density of each of the individual keywords comprising the phrases to make sure it is within the specified limits. In practice, a keyword density of more than 7-8% does not seem to have any negative seo consequences. However, it is not necessary and can reduce the legibility of the content from a user’s viewpoint.

«TITLE» tag
This is one of the most important tags for search engines. Make use of this fact in your seo work. Keywords must be used in the TITLE tag. The link to your site that is normally displayed in search results will contain text derived from the TITLE tag. It functions as a sort of virtual business card for your pages. Often, the TITLE tag text is the first information about your website that the user sees. This is why it should not only contain keywords, but also be informative and attractive. You want the searcher to be tempted to click on your listed link and navigate to your website. As a rule, 50-80 characters from the TITLE tag are displayed in search results and so you should limit the size of the title to this length.

Description Meta tag
This is used to specify page descriptions. It does not influence the seo ranking process but it is very important. A lot of search engines (including the largest one – Google) display information from this tag in their search results if this tag is present on a page and if its content matches the content of the page and the search query.

Experience has shown that a high position in search results does not always guarantee large numbers of visitors. For example, if your competitors' search result description is more attractive than the one for your site then search engine users may choose their resource instead of yours. That is why it is important that your Description Meta tag text be brief, but informative and attractive. It must also contain keywords appropriate to the page.

Keywords Meta tag
This Meta tag was initially used to specify keywords for pages but it is hardly ever used by search engines now. It is often ignored in seo projects. However, it would be advisable to specify this tag just in case there is a revival in its use. The following rule must be observed for this tag: only keywords actually used in the page text must be added to it.

Why inbound links to sites are taken into account
As you can see from the previous section, many factors influencing the ranking process are under the control of webmasters. If these were the only factors then it would be impossible for search engines to distinguish between a genuine high-quality document and a page created specifically to achieve high search ranking but containing no useful information. For this reason, an analysis of inbound links to the page being evaluated is one of the key factors in page ranking. This is the only factor that is not controlled by the site owner.

It makes sense to assume that interesting sites will have more inbound links. This is because owners of other sites on the Internet will tend to have published links to a site if they think it is a worthwhile resource. The search engine will use this inbound link criterion in its evaluation of document significance.

Therefore, two main factors influence how pages are stored by the search engine and sorted for display in search results:

- Relevance, as described in the previous section on internal ranking factors.

- Number and quality of inbound links, also known as link citation, link popularity or citation index. This will be described in the next section.

Google PageRank – theoretical basics
The Google company was the first company to patent the system of taking into account inbound links. The algorithm was named PageRank. In this section, we will describe this algorithm and how it can influence search result ranking.

PageRank is estimated separately for each web page and is determined by the PageRank (citation) of other pages referring to it. It is a kind of “virtuous circle.” The main task is to find the criterion that determines page importance. In the case of PageRank, it is the possible frequency of visits to a page.

I shall now describe how user’s behavior when following links to surf the network is modeled. It is assumed that the user starts viewing sites from some random page. Then he or she follows links to other web resources. There is always a possibility that the user may leave a site without following any outbound link and start viewing documents from a random page. The PageRank algorithm estimates the probability of this event as 0.15 at each step. The probability that our user continues surfing by following one of the links available on the current page is therefore 0.85, assuming that all links are equal in this case. If he or she continues surfing indefinitely, popular pages will be visited many more times than the less popular pages.

The PageRank of a specified web page is thus defined as the probability that a user may visit the web page. It follows that, the sum of probabilities for all existing web pages is exactly one because the user is assumed to be visiting at least one Internet page at any given moment.

Since it is not always convenient to work with these probabilities the PageRank can be mathematically transformed into a more easily understood number for viewing. For instance, we are used to seeing a PageRank number between zero and ten on the Google Toolbar.

According to the ranking model described above:
- Each page on the Net (even if there are no inbound links to it) initially has a PageRank greater than zero, although it will be very small. There is a tiny chance that a user may accidentally navigate to it.
- Each page that has outbound links distributes part of its PageRank to the referenced page. The PageRank contributed to these linked-to pages is inversely proportional to the total number of links on the linked-from page – the more links it has, the lower the PageRank allocated to each linked-to page.
- PageRank A “damping factor” is applied to this process so that the total distributed page rank is reduced by 15%. This is equivalent to the probability, described above, that the user will not visit any of the linked-to pages but will navigate to an unrelated website.

Let us now see how this PageRank process might influence the process of ranking search results. We say “might” because the pure PageRank algorithm just described has not been used in the Google algorithm for quite a while now. We will discuss a more current and sophisticated version shortly. There is nothing difficult about the PageRank influence – after the search engine finds a number of relevant documents (using internal text criteria), they can be sorted according to the PageRank since it would be logical to suppose that a document having a larger number of high-quality inbound links contains the most valuable information.

Thus, the PageRank algorithm "pushes up" those documents that are most popular outside the search engine as well.

Creating correct content
The content of a site plays an important role in site promotion for many reasons. We will describe some of them in this section. We will also give you some advice on how to populate your site with good content.

- Content uniqueness. Search engines value new information that has not been published before. That is why you should compose own site text and not plagiarize excessively. A site based on materials taken from other sites is much less likely to get to the top in search engines. As a rule, original source material is always higher in search results.

- While creating a site, remember that it is primarily created for human visitors, not search engines. Getting visitors to visit your site is only the first step and it is the easiest one. The truly difficult task is to make them stay on the site and convert them into purchasers. You can only do this by using good content that is interesting to real people.

- Try to update information on the site and add new pages on a regular basis. Search engines value sites that are constantly developing. Also, the more useful text your site contains, the more visitors it attracts. Write articles on the topic of your site, publish visitors' opinions, create a forum for discussing your project. A forum is only useful if the number of visitors is sufficient for it to be active. Interesting and attractive content guarantees that the site will attract interested visitors.

- A site created for people rather than search engines has a better chance of getting into important directories such as DMOZ and others.

- An interesting site on a particular topic has much better chances to get links, comments, reviews, etc. from other sites on this topic. Such reviews can give you a good flow of visitors while inbound links from such resources will be highly valued by search engines.

- As final tip…there is an old German proverb: "A shoemaker sticks to his last" which means, "Do what you can do best.” If you can write breathtaking and creative textual prose for your website then that is great. However, most of us have no special talent for writing attractive text and we should rely on professionals such as journalists and technical writers. Of course, this is an extra expense, but it is justified in the long term.