Hypergraph-theoretic partitioning models for parallel web crawling

Turk, A.
Cambazoglu, B.B.
Aykanat, C.

Open PDF

Open link

Publication date

January 2012

DOI

10.1007/978-1-4471-2155-8-2

Publisher

Springer Science and Business Media LLC

Abstract

Parallel web crawling is an important technique employed by large-scale search engines for content acquisition. A commonly used inter-processor coordination scheme in parallel crawling systems is the link exchange scheme, where discovered links are communicated between processors. This scheme can attain the coverage and quality level of a serial crawler while avoiding redundant crawling of pages by different processors. The main problem in the exchange scheme is the high inter-processor communication overhead. In this work, we propose a hypergraph model that reduces the communication overhead associated with link exchange operations in parallel web crawling systems by intelligent assignment of sites to processors. Our hypergraph model can c...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Hypergraph-theoretic partitioning models for parallel web crawling

Abstract

Extracted data

Hypergraph-theoretic partitioning models for parallel web crawling

Abstract

Extracted data

Related items

Related items