International audienceA typical web search engine consists of three principal parts: crawling engine, indexing engine, and searching engine. The present work aims to optimize the performance of the crawling engine. The crawling engine finds new web pages and updates web pages existing in the database of the web search engine. The crawling engine has several robots collecting information from the Internet. We first calculate various performance measures of the system (e.g., probability of arbitrary page loss due to the buffer overflow, probability of starvation of the system, the average time waiting in the buffer). Intuitively, we would like to avoid system starvation and at the same time to minimize the information loss. We formulate the p...
International audienceWe consider the task of scheduling a crawler to retrieve from several sites th...
AbstractWe study the problem of caching query result pages in Web search engines. Popular search eng...
With an ever increasing amount of data that is shared and posted on the Web, the desire and necessit...
Human nature is greedy to follow less effort heuristics in seeking of scientific literature. Despite...
International audienceA search engine maintains local copies of different web pages to provide quick...
International audienceA search engine uses a web crawler to crawl the pages from the world wide web ...
Crawling algorithms have been the subject of extensive research and optimizations, but some importan...
AbstractLarge search engines are complex systems composed by several services. Each service is compo...
AbstractSearch engines must keep an up-to-date image to all Web pages and other web resources hosted...
Web crawling is the problem of keeping a cache of webpages fresh, i.e., having the most recent copy ...
For many search settings, distributed/replicated search engines deploy a large number of machines to...
The rapid increase in the amount of available information from various online sources poses new cha...
Large web search engines process billions of queries each day over tens of billions of documents wit...
In this paper we address the problem of estimating the index size needed by web search engines to an...
Abstract: Search engine optimization (SEO) is the process of improving the visibility and scope of a...
International audienceWe consider the task of scheduling a crawler to retrieve from several sites th...
AbstractWe study the problem of caching query result pages in Web search engines. Popular search eng...
With an ever increasing amount of data that is shared and posted on the Web, the desire and necessit...
Human nature is greedy to follow less effort heuristics in seeking of scientific literature. Despite...
International audienceA search engine maintains local copies of different web pages to provide quick...
International audienceA search engine uses a web crawler to crawl the pages from the world wide web ...
Crawling algorithms have been the subject of extensive research and optimizations, but some importan...
AbstractLarge search engines are complex systems composed by several services. Each service is compo...
AbstractSearch engines must keep an up-to-date image to all Web pages and other web resources hosted...
Web crawling is the problem of keeping a cache of webpages fresh, i.e., having the most recent copy ...
For many search settings, distributed/replicated search engines deploy a large number of machines to...
The rapid increase in the amount of available information from various online sources poses new cha...
Large web search engines process billions of queries each day over tens of billions of documents wit...
In this paper we address the problem of estimating the index size needed by web search engines to an...
Abstract: Search engine optimization (SEO) is the process of improving the visibility and scope of a...
International audienceWe consider the task of scheduling a crawler to retrieve from several sites th...
AbstractWe study the problem of caching query result pages in Web search engines. Popular search eng...
With an ever increasing amount of data that is shared and posted on the Web, the desire and necessit...