Before starting to talk about any referencing, it is important to have an overview of how a search engine works. As the biggest in the group, we will use the google algorithm to show briefly how things happen. As you may suspect, this is the main key of the entire SEOprocedure.
A search engine is a software that allows users to find resources they are searching (web pages, forums, images, video files, etc.) with the help of words or group of words.
The algorithm is as follow:
– The crawlers receive from the URL server all addresses to be visited.
– The pages processed by the crawlers are compressed by the store server. It sends them to the repository where they are stored.
– The contents of the repository is read and decompressed by the indexer. It assigns an identification number docID for each document where each page is converted into a set of occurrences of terms (each occurrence is called a « hit »). In addition, information on the « weight » of the word in the page (position, highlighted …) are recorded.
– The indexer distributes occurrences of a set of « barrels » (organized by docID).
– Some information generated by the indexer is stored by the Anchors as hyperlinks and anchors associated with them (text links).
– The information provided by the Anchors is recovered by the URL solver which converts each URL pointed to a docID (if the address does not exist in Doc Index, then it is added).
– The Links contains pairs of docID (this is the URL that the solver receives). These pairs are the page to which anchor belongs to and the page were it points.
– To calculate the PageRank of each document (ranked by popularity), the information in this links database are retrieved by PageRank.
– Organized by docID, Sorter retrieves the data stored in the « Barrels » and reorganized wordID (identities of words).
– After a comparison of the list of words created by the Sorter with words from Lexicon (vocabulary), missing word in the lexicon is added.
– Finally, research is carried out by the Searcher in order to respond to user requests. To do so, it uses the lexicon (created by the indexer), the inverted index from Barrels; URLs associated with the words of the inverted index (from Doc Index) and all information concerning PageRank the popularity of pages.
– The server consults the inverted index for each. Thus, it includes a list of documents that includes the search terms (hit list). Then the pages are ranked in the server based on the popularities indices and relevance.