This article introduces an architecture for a document-partitioned search engine, based on a novel approach combining collection selection and load balancing, called load-driven routing. By exploit-ing the query-vector document model, and the incremental caching technique, our architecture can compute very high quality results for any query, with only a fraction of the computational load used in a typical document-partitioned architecture. By trading off a small fraction of the results, our technique allows us to strongly reduce the computing pressure to a search engine back-end; we are able to retrieve more than 2/3 of the top-5 results for a given query with only 10 % the computing load needed by a configuration where the query is process...
Better system resource utilization for search engine clusters can result in significant benefits. By...
This work performs a thorough characterization and analysis of the open source Lucene search library...
The amount of content on the Internet is growing rapidly as well as the number of the online Interne...
Abstract—To address the rapid growth of the Internet, modern Web search engines have to adopt distri...
To address the rapid growth of the Internet, moder Web search engines have to adopt distributed orga...
Web search engines have to deal with a rapidly increasing amount of information, high query loads an...
In this paper, we introduce a new collection selection strategy to be operated in search engines wit...
Caching of query results is an important mechanism for efficiency and scalability of web search engi...
Information retrieval systems often have to deal with very large amounts of data. They must be able ...
The amount of available data has increased notably in the last few years, exposing scalability probl...
In this thesis, we present a distributed architecture for a Web search engine, based on the concept ...
Results caching is an efficient technique for reducing the query processing load, hence it is common...
This research focuses on automatically adapting a search engine size in response to fluctuations in ...
Large web search engines process billions of queries each day over tens of billions of documents wit...
Large-scale web search engines are composed of multiple data centers that are geographically distant...
Better system resource utilization for search engine clusters can result in significant benefits. By...
This work performs a thorough characterization and analysis of the open source Lucene search library...
The amount of content on the Internet is growing rapidly as well as the number of the online Interne...
Abstract—To address the rapid growth of the Internet, modern Web search engines have to adopt distri...
To address the rapid growth of the Internet, moder Web search engines have to adopt distributed orga...
Web search engines have to deal with a rapidly increasing amount of information, high query loads an...
In this paper, we introduce a new collection selection strategy to be operated in search engines wit...
Caching of query results is an important mechanism for efficiency and scalability of web search engi...
Information retrieval systems often have to deal with very large amounts of data. They must be able ...
The amount of available data has increased notably in the last few years, exposing scalability probl...
In this thesis, we present a distributed architecture for a Web search engine, based on the concept ...
Results caching is an efficient technique for reducing the query processing load, hence it is common...
This research focuses on automatically adapting a search engine size in response to fluctuations in ...
Large web search engines process billions of queries each day over tens of billions of documents wit...
Large-scale web search engines are composed of multiple data centers that are geographically distant...
Better system resource utilization for search engine clusters can result in significant benefits. By...
This work performs a thorough characterization and analysis of the open source Lucene search library...
The amount of content on the Internet is growing rapidly as well as the number of the online Interne...