How Search Engine Spiders Work

Print Version
Share to a friend

There are hundreds of search engines available today, but some are far more complex than others.

This article will give you an overview of how some of the most popular ones work.

Let’s start with a smaller engine: InfoSeek. They only index about 200 words of your web page,

so it’s important to make sure that you have meta tags on your site, and that the most important

things are listed first. The information you put in your meta tags will be used to display a description

of your site, and most meta tags can contain about 200 characters of text. The keywords meta tag, however,

can have up to 1,000 characters.

These simple rules are important to keep in mind for all search engines. The more important that the

information is, the closer it has to be to the beginning of your meta tags or even the beginning of your site’s

content. Many search engines won’t even touch your meta tags so it is important that you have the same

information in your body that you have in your meta tags (although you obviously cannot simply enter

lists and lists of key words as this would be detrimental to your site’s content).

 

The AltaVista search engine will send Scooter, its spider, to check out your entire site.

Scooter can take as long as three months to spider and fully index your site – the average spider

only takes 6-8 weeks. Scooter will normally spider somewhere between two and ten pages from your site

each week. This means that the longer that your web site lasts, the better it will be indexed which is

in example of how search engines implement Darwin’s Theory into their ideology.

Excite used to be a search powerhouse, but has now been dropped as the provider of AOL and

Netscape search, so it’s less important than it once was. The algorithm it uses to determine keyword

relevance is very complicated: it indexes your pages and then attempts to summarize them by selecting only

the most relevant sentences. Expect to have your pages reviewed roughly once every two weeks.

Keep in mind, though, that with meta tags have no meaning to Excite when it comes to rankings,

even though it will use your description tags as long as the words are relevant to your pages’ content.

 

Let’s move on to Lycos. Lycos has fully integrated the Open Directory Project (ODP) into their

mainstream results pages, and they also use search results from AllTheWeb. Lycos also runs click-throughs

to their sister site HotBot. Lycos is one of the harder search engines to understand, as their

submission pages say one thing but then they index your site in a completely different way.

As a general rule of thumb, your site will be indexed in Lycos in due time as long as you get indexed

in ODP and AllTheWeb.

Even though WebCrawler is owned by Excite, it still has its own search engine and indexer.

If you happen to be listed with WebCrawler, you should try to stay listed with them, as it isn’t

the easiest search engine to get listed with. Its hit-and-miss standards combined with the sporadic

indexing methods makes the submission process tough, although not impossible.

 

The biggest player is, of course, Google, who use a page ranking system as the central basis of their index.

It was once nearly impossible to manipulate this page ranking system to drive up your rankings,

but people quickly figured out that the more links they could generate to their site on the rest of the net,

the better Google ranked them. Google is not thought to be using context-sensitive rankings.

Context-sensitive information is used at Yahoo, Looksmart and the ODP, however, and Google regularly

spiders those sites when it re-indexes its own database.

 

MSN is another important search engine. The holy trinity of search engines at the moment is Google, Yahoo!,

and MSN. These three search engines combine to provide you with the vast majority of the traffic that

you will receive from search engines. MSN will generally be the first search engine to index your site

and it will almost certainly list the most pages the fastest.

 

Although no-one can tell you exactly when you will be indexed on any search engine,

it’s best to check back at least weekly. Whatever you do, though, don’t re-submit your site more

often than every two months or so – you might not get indexed at all if you do this.