Issue 2 of Writing for the web
In order to understand SEO you need to understand search engines. Search engines are actually trying to solve two different problems:
1) If someone searches for a keyword, what pages are about that keyword?
2) And of those pages that are about the keyword, which one is the most relevant (i.e. should be listed first) for the user?
This might not seem very advanced, and if there were only a couple of pages on the Internet it wouldn’t be, but the sheer amount of websites and content on the web makes this task extremely difficult. How do you know which ones of the billions of pages that exist online are relevant at all, and how do you know which of these are the most relevant? The task is daunting indeed.
What search engines do is they send a "spider" to "crawl" websites. The spiders (or crawlers) go from link to link, and extract information about each page they visit, such as content on that page, links to other pages, and other information they may find useful. The process involves the search engine spider downloading a page and storing it on the search engine's own server, in order to make a huge catalogue of all the web pages on the net called “Index”.
In the index there is a second program, known as an indexer, which extracts various information about the stored page such as: the words it contains, where these words are located, any weighting of specific words, and all the links that the page contains, which are then placed into a scheduler for crawling at a later date.
All this information is then used to determine which pages in the index are relevant and which ones are the most relevant whenever a search is performed.
Webmasters and content providers began optimising sites for search engines in the mid¬ 1990s, when the very first search engines were cataloguing the early web. Site owners started to recognise the value of having their sites highly ranked and visible in search engine results, which created an opportunity for both white hat and black hat SEO practitioners.
Early versions of search engines that used mathematic algorithms to determine which sites are the most relevant relied on webmaster-provided information such as keyword meta tags. Meta tags provide a guide to each page's content. Using meta data to index pages was found to be less reliable, however, because the webmaster's choice of keywords in the meta tag could potentially be an inaccurate representation of the site's actual content. Inaccurate, incomplete, and inconsistent data in meta tags could and did cause pages to rank for irrelevant searches. Web content providers also manipulated a number of attributes within the HTML source of a page in an attempt to rank well in search engines.
By 1997, search engines recognised that webmasters were making efforts to rank well on search results, and that some webmasters were even manipulating the rankings by stuffing pages with excessive or irrelevant keywords. Early search engines, such as Altavista and Infoseek, adjusted their algorithms in an effort to prevent webmasters from manipulating rankings.
By relying on factors such as keyword density, which were exclusively within a webmaster's control, early search engines suffered from abuse and ranking manipulation. To provide better results to users, search engines had to adapt to ensure that the result pages showed the most relevant search results, rather than unrelated pages stuffed with numerous keywords. Since the success and popularity of a search engine is determined by its ability to produce the most relevant results to any given search, providing irrelevant and inaccurate results would force users to use another search engine.
Search engines responded by developing more complex ranking algorithms, taking into account additional factors that were more difficult for webmasters to manipulate. Graduate students at Stanford University, Larry Page and Sergey Brin, developed "Backrub," a search engine that relied on a mathematical algorithm in rating the prominence of web pages. The number calculated by the algorithm, PageRank, is a function of the quantity and strength of inbound links. PageRank estimated the likelihood of that a given page will be reached by a web user who randomly surfs the web, and follows links from one page to another. In effect, this means that some links are stronger than others, as a higher ranked PageRank page is more likely to be reached by a random web surfer.