How Google works might as well be called magic because no one outside of Google, and very few inside Google, knows exactly how it does works. It is a closely guarded secret and it constantly changes. But we do know some of it from statements Google have made and through observation.
The process that takes your content and makes them available to the world is a three part process.
Google’s magic starts with the millions of web site owners, bloggers and hobbyist who create the internet’s content. When they press the button to publish, they set in motion a chain of events that can lead to internet success or failure.
Google may be magic and have vast banks of computers but they cannot actually monitor everything in the world. So when new content is published, Google won't automatically know about it. How Google works out where the new content is happens in a number of different ways.
For a brand new domain (.e.g. www.HappyNewYear2010.com ) the simplest and quickest start is to tell Google about it.
For existing domains, Google will automatically detect new content by a process known as spidering. It regularly goes back to every URL it knows about, and adds any new content and links it finds into its index.
The spidering process is not limited to the domain being checked. So your new page could be added to Google’s index not because of the links on your own site, but because someone else’s URL has linked to your new content. This can be very useful because the frequency that Google checks your domain is based on how often it gets updated. Google may only re-index your site once a month if your site rarely changes. However if your new content is featured on Digg it will be added to Google almost straightaway because Digg’s content is re-indexed by Google every hour.
To help the Google spider, use a Sitemap. This is a special page that tells Google’s spider about every URL in your site and how often they get updated. It is an essential part of the web site owner’s tool bag. See Google’s Web Master tools.
If you are using blogging tools such as Typepad or Wordpress then your content will almost certainly be added to Google index in a matter of hours. Recognising that blogging is fast paced, often reacting in minutes to world events, Google have worked hard to ensure new posts are added to the index as soon as possible.
Once Google has located new content it stores a copy in its databanks for analysis. This is known as 'caching'. Through a variety of processes it indexes or links the new content to all the keywords that may apply to that page.
The indexing process also scores the page on a number of factors (see Google Rankings: Seven Things You Need To Know). Some of these are static, unchanging, such as whether the keyword appears in the URL. Others are dynamic and can alter over time. These include the number of links to the page and how many people actually visit the site via Google. Exactly how Google works out this score is a closely guarded secret.
The result of the indexing is that each page has a score for each keyword associated with it. The better the score, the higher up the results your page will be. As part of the spidering and indexing process , Google reassess existing content and rescores it to reflect any changes.
The process of returning the results to a search is quick and easy. It already knows every page relevant to your search term and how well they score. Google simply looks up the best scoring pages for the keyword. This is how Google can deliver a hundred of results in under a second.
Chris Tregenza runs a variety of web sites including MiceLife, writes free seo articles and a SEO Blog
Better search engine placement through article writing.