7/12/10

How Search Engines Work - 1: Spiders that Crawl (Yahoo! Slurp, GoogleBot, BingBot)

You want your web site or blog to be in the top search results at Google, Yahoo, Bing, Ask, etc.?  Well, to accomplish that goal you need to know a bit about how search engines work.  Let's start with crawling.

What are spiders in search engines?

Each search engine has its own software tools that crawl the web (let's ignore the fact that the Internet and the WorldWideWeb are not synonymous for now), looking for content to provide its readers in search results. 

Yahoo! Slurp is the name of Yahoo! web crawler.  Googlebot is the name of Google's spider. Right now, Bing is using MSN's old msnbot, but starting in October 2010, Bing will crawl the web using its new, fancy BingBot.  (BingBot is in beta now.)

These "spiders" (also known as software robots or just "bots") are really computer code that reviews your site, analyzing all the content that is contained on your web page or blog.  Your content is coded to allow the spider to do this.  It can be coded by you or coded automatically by your software, like Blogger is doing for me right now as I type in Compose mode. 

The spiders go through the content, word by word.  Or almost word by word.  (Their overall, global job is to find and organize every word they find on the Web, building lists of these words -- that's the big Web Crawl function.)  Google may exclude little words like "a" "the" etc., while AltaVista indexes every word found on a site page.  No search engine crawler crawls exactly the same as its competitors do. 

The spiders also jump to your links.  The spiders, or bots, will check out both the internal links you've placed on your site linking your content to other pages on your site, as well as those external links where you've connected your content to outside sources of information.   

This is called "crawling" your site. 

And, those spiders don't do this just once.  No, no.  They'll be back.  They'll pop back over and check your site periodically, just to see if things have changed.  Have you added new content?  Do your links still work?  They'll also be looking at how your words are being used on the page:  titles, subtitles, headings, meta tags, etc. will be given a special tip of the hat as the search engine prioritizes your site. 

Why do you care about spiders that crawl?  Because they teach you that it's important to (1) have content; (2) have sufficient content - 250 words or more; (3) have quality content; and (4) have links that work so that the spiders have something to index and rank from your site.

Index? Rank?  More on that in the next post. 

No comments: