7/15/10

How Search Engines Work - 2: Indexing and Ranking (Google Caffiene, Google Page Rank)

Once spiders (or bots) crawl your site, they take the content information they've discovered on your web site and place that information in a database.  In an organized fashion, of course.  This is called "indexing." 

Indexing Involves Organization of the Information Amassed by the Spiders

Your content will be indexed according to how informative and helpful the spider or bot determines it to be for a particular topic, when compared to the other sites that it has crawled and placed within its database.  Exactly how things are indexed is a big, big trade secret for each of the search engines. 

Especially Google.  In fact, last summer announced Google Caffeine - reportedly primarily a new twist on Google indexing according to Matt Cutts, although it also involves changes to Google crawling and ranking features as well. 

Last month, Google Caffeine officially debuted as a "new web indexing system" that the Google blog describes as:
50 percent fresher results for web searches than our last index, and it's the largest collection of web content we've offered. Whether it's a news story, a blog or a forum post, you can now find links to relevant content much sooner after it is published than was possible ever before.
In litigation terms, the spiders undertake discovery, bring back all the facts relevant to the subject matter, and proceed to categorize them for easy use and retrieval by the search engine's clientele.  Major witnesses and documents will be given priority over inconsequential sources of data that discovery has obtained.  That's where ranking comes into play. 

Ranking Involves Secret Decision-Making Protocols That Decide Who Gets Top Billing

Once the search engine has gathered all the data from the web, and then segregated that data according to subject matter, the major decisions must be made.  Which sites go where in the search results?  Which sites are going to be recommended to the search engine's clientele as the best sites in response to the client's search request? 

This is very important to the search engine, because its clientele depend upon its top 5 or 10 results to be the most informative sites regarding the query that has been made.  Bad search results, and the client can always use another search engine.  Google hasn't cornered the market because of its motto or the fact that its employees get to bring their dogs to work.  No.  Google has cornered the search engine market because it has been able to please more clients than the other search engines right here -- in how it ranks results.

Google Ranking Tactics: Page Rank Technology and Hypertext-Matching Analysis

Google in some ways doesn't try to hide the ball on ranking in its results.  According to its own Corporate Information, ranking is the result of:
1.  Page Rank Technology
"PageRank reflects our view of the importance of web pages by considering more than 500 million variables and 2 billion terms. Pages that we believe are important pages receive a higher PageRank and are more likely to appear at the top of the search results...." and
2.  Hypertext-Matching Analysis
"Our search engine also analyzes page content. However, instead of simply scanning for page-based text (which can be manipulated by site publishers through meta-tags), our technology analyzes the full content of a page and factors in fonts, subdivisions and the precise location of each word. We also analyze the content of neighboring web pages to ensure the results returned are the most relevant to a user's query."
Search Engine Ranking and SEO (Search Engine Optimization)

No search engine freely discloses how it makes its ranking decisions.  And, of course, savvy web surfers know that true research means using more than one search engine -- because they rank sites differently.  The same search in Google may not give the same results in Yahoo or Bing. 

Ranking isn't the last stop in search engine inner workings.  However, understanding crawling, indexing, and ranking are the basics one needs to understand when writing blogs or web site content for the web. 

You're not just writing for your intended reader, be it a colleague, a referring attorney, or a potential client.  You're also writing to please the spiders -- if you want to successful index and rank in the search engines.  And most do.

Search Engine Optimization

Which is where the field of search engine optimization comes into play.  SEO undertakes to strategize on how to achieve placement in the top search engine results of the various search engines for a web site through an understanding of crawling, indexing, and ranking techniques used by the various search engines.

SEO involves many things.  Content, design, coding, advertising, and more.  SEO can involve paid marketing strategies (pay per click, etc.).  SEO can include design strategies implemented with certain search engine policies in mind. 

SEO can incorporate tactics within the coding of the site ("source code" or "HTML code") -- which may or may not be visible to the site visitor.   "Black hat" SEO is an example of hidden coding unwelcomed by the search engines.  For example, hiding favorable keywords or key phrases within the coding of the site - where it remains unseen by the reader of the content - is clever optimization tool that will get a website penalized. 

SEO can also include optimizing the content placed within the site to make the site's content "search engine friendly." Here, key words and key  phrases are placed within the content, as well as other SEO strategies to encourage high rankings of the site itself.