SCRAWLER: A SEED-BY-SEED PARALLEL WEB CRAWLER

Joo Yong Lee
Sang Ho Lee
Yanggon Kim

Publication date

April 2015

Abstract

Abstract: As the size of the Web grows, it becomes increasingly important to parallelize a crawling process in order to complete downloading pages in a reasonable amount of time. This paper presents the design and implementation of an effective parallel web crawler. We first present various design choices and strategies for a parallel web crawler, and describe our crawler’s architecture and implementation techniques. In particular, we investigate the URL distributor for URL balancing and the scalability of our crawler.

Extracted data

We use cookies to provide a better user experience.

Data Protection

SCRAWLER: A SEED-BY-SEED PARALLEL WEB CRAWLER

Abstract

Extracted data

SCRAWLER: A SEED-BY-SEED PARALLEL WEB CRAWLER

Abstract

Extracted data

Related items

Related items