In this thesis, we present a distributed architecture for a Web search engine, based on the concept of collection selection. We introduce a novel approach to partition the collection of documents, able to greatly improve the effectiveness of standard collection selection techniques (CORI), and a new selection function outperforming the state of the art. Our technique is based on the novel query-vector (QV) document model, built from the analysis of query logs, and on our strategy of co-clustering queries and documents at the same time. Incidentally, our partitioning strategy is able to identify documents that can be safely moved out of the main index (into a supplemental index), with a minimal loss in result accuracy. In our test, we could ...
Better system resource utilization for search engine clusters can result in significant benefits. By...
Cataloged from PDF version of article.Large-scale web search engines are composed of multiple data c...
A fully operational large scale digital library is likely to be based on a distributed architecture ...
In this paper, we introduce a new collection selection strategy to be operated in search engines wit...
Web search engines are the most popular mean of interaction with the Web. Realizing a search engine ...
Abstract—To address the rapid growth of the Internet, modern Web search engines have to adopt distri...
This article introduces an architecture for a document-partitioned search engine, based on a novel a...
In this dissertation, we present protocols for building a distributed search infrastructure over str...
To address the rapid growth of the Internet, moder Web search engines have to adopt distributed orga...
This research focuses on automatically adapting a search engine size in response to fluctuations in ...
World Wide Web search engines process millions of queries per day from users all over the world. Eff...
The web is becoming more dynamic due to the increasing engagement and contribution of Internet users...
People use web search engines to fill a wide variety of navigational, informational and transactiona...
In this poster we describe the development of a distributed search engine, referred to as Físréal, w...
The creation of very large-scale multimedia search engines, with more than one billion images and v...
Better system resource utilization for search engine clusters can result in significant benefits. By...
Cataloged from PDF version of article.Large-scale web search engines are composed of multiple data c...
A fully operational large scale digital library is likely to be based on a distributed architecture ...
In this paper, we introduce a new collection selection strategy to be operated in search engines wit...
Web search engines are the most popular mean of interaction with the Web. Realizing a search engine ...
Abstract—To address the rapid growth of the Internet, modern Web search engines have to adopt distri...
This article introduces an architecture for a document-partitioned search engine, based on a novel a...
In this dissertation, we present protocols for building a distributed search infrastructure over str...
To address the rapid growth of the Internet, moder Web search engines have to adopt distributed orga...
This research focuses on automatically adapting a search engine size in response to fluctuations in ...
World Wide Web search engines process millions of queries per day from users all over the world. Eff...
The web is becoming more dynamic due to the increasing engagement and contribution of Internet users...
People use web search engines to fill a wide variety of navigational, informational and transactiona...
In this poster we describe the development of a distributed search engine, referred to as Físréal, w...
The creation of very large-scale multimedia search engines, with more than one billion images and v...
Better system resource utilization for search engine clusters can result in significant benefits. By...
Cataloged from PDF version of article.Large-scale web search engines are composed of multiple data c...
A fully operational large scale digital library is likely to be based on a distributed architecture ...