Abstract Web archives preserve the history of born-digital content and offer great potential for sociologists, busi-ness analysts, and legal experts on intellectual property and compliance issues. Data quality is crucial for these pur-poses. Ideally, crawlers should gather coherent captures of entire Web sites, but the politeness etiquette and complete-ness requirement mandate very slow, long-duration crawling while Web sites undergo changes. This paper presents the SHARC framework for assessing the data quality in Web archives and for tuning capturing strategies toward better quality with given resources. We define data quality mea-sures, characterize their properties, and develop a suite of quality-conscious scheduling strategies for arch...
International audienceSince late 90s, there has been a large investment in web archiving. Accessing ...
Web archival materials are not direct traces of the web, they are direct traces of crawlers. By desi...
This project is an exploratory study into the automation of web archive quality checking process. Th...
Web archives preserve the history of born-digital content and offer great potential for sociologists...
Web archives preserve the history of born-digital content and offer great potential for sociologists...
Web archives preserve the history of born-digital content and offer great potential for sociologists...
Web archives preserve the history of Web sites and have high long-term value for media and business ...
Web archives offer a rich and plentiful source of information to researchers, analysts, and legal ex...
International audienceDue to the growing importance of the Web, several archiving institutes (nation...
The World Wide Web is a continuously evolving network of contents (e.g. Web pages, images, sound fil...
With the growing importance of the World Wide Web, the major challenges our society faces are also i...
The World Wide Web is a continuously evolving network of contents (e.g. Web pages, images, sound fil...
[Purpose/significance] Quality assurance is one of the most important procedures...
The steady growth of the World Wide Web raises challenges regarding the preservation of meaningful W...
Περιέχει το πλήρες κείμενοThe World Wide Web is a continuously evolving network of contents (e.g. We...
International audienceSince late 90s, there has been a large investment in web archiving. Accessing ...
Web archival materials are not direct traces of the web, they are direct traces of crawlers. By desi...
This project is an exploratory study into the automation of web archive quality checking process. Th...
Web archives preserve the history of born-digital content and offer great potential for sociologists...
Web archives preserve the history of born-digital content and offer great potential for sociologists...
Web archives preserve the history of born-digital content and offer great potential for sociologists...
Web archives preserve the history of Web sites and have high long-term value for media and business ...
Web archives offer a rich and plentiful source of information to researchers, analysts, and legal ex...
International audienceDue to the growing importance of the Web, several archiving institutes (nation...
The World Wide Web is a continuously evolving network of contents (e.g. Web pages, images, sound fil...
With the growing importance of the World Wide Web, the major challenges our society faces are also i...
The World Wide Web is a continuously evolving network of contents (e.g. Web pages, images, sound fil...
[Purpose/significance] Quality assurance is one of the most important procedures...
The steady growth of the World Wide Web raises challenges regarding the preservation of meaningful W...
Περιέχει το πλήρες κείμενοThe World Wide Web is a continuously evolving network of contents (e.g. We...
International audienceSince late 90s, there has been a large investment in web archiving. Accessing ...
Web archival materials are not direct traces of the web, they are direct traces of crawlers. By desi...
This project is an exploratory study into the automation of web archive quality checking process. Th...