Saturday, December 10, 2011

Wikipedia

Each index server contains a set of index shards. They return a list of document IDs ("docid"), such that documents corresponding to a certain docid contain the query word. These servers need less disk space, but suffer the greatest CPU workload.

1 comment:

  1. Google's server infrastructure is divided in several types, each assigned to a different purpose:

    Google load balancers take the client request and forward it to one of the Google Web Servers via Squid proxy servers.

    Squid proxy servers take the client request from load balancers and return the result if present in local cache otherwise forward it to Google Web Server.

    Google web servers coordinate the execution of queries sent by users, then format the result into an HTML page. The execution consists of sending queries to index servers, merging the results, computing their rank, retrieving a summary for each hit (using the document server), asking for suggestions from the spelling servers, and finally getting a list of advertisements from the ad server.

    Data-gathering servers are permanently dedicated to spidering the Web. Google's web crawler is known as GoogleBot. They update the index and document databases and apply Google's algorithms to assign ranks to pages.

    Each index server contains a set of index shards. They return a list of document IDs ("docid"), such that documents corresponding to a certain docid contain the query word. These servers need less disk space, but suffer the greatest CPU workload.

    Document servers store documents. Each document is stored on dozens of document servers. When performing a search, a document server returns a summary for the document based on query words. They can also fetch the complete document when asked. These servers need more disk space.

    ReplyDelete