Digitization of historical documents is a challenging task in many digital humanities projects. A popular approach for digitization is to scan the documents into images, and then convert images into text using Optical Character Recognition (OCR) algorithms. However, the outcome of OCR processing of historical documents is usually inaccurate and requires post-processing error correction. This study investigates how crowdsourcing can be utilized to correct OCR errors in historical text collections, and which crowdsourcing methodology is the most effective in different scenarios and for various research objectives. A series of experiments with different micro-task's structures and text lengths was conducted with 753 workers on the Amazon's Mec...
The millions of pages of historical documents that are digitized in libraries are increasingly used ...
International audienceCataloguing, indexing, and correcting the OCR of digitized documents, librarie...
Cataloguing, indexing, and correcting the OCR of digitized documents, libraries have often externali...
Digitization of historical documents is a challenging task in many digital humanities projects. A po...
Crowdsourcing approaches for post-correction of OCR output (Optical Character Recognition) have been...
Crowdsourcing approaches for post-correction of OCR output (Optical Character Recognition) have been...
Optical character recognition (OCR) for historical documents is a complex procedure subject to a uni...
In this paper we describe our efforts in reducing and correcting OCR errors in the context of buildi...
A trend to digitize historical paper-based archives has emerged in recent years, with the advent of ...
International audienceThis paper describes the second round of the ICDAR 2019 competition on post-OC...
Digitized document collections often suffer from OCR errors that may impact a document's readability...
Over the past few decades, large archives of paper-based documents such as books and newspapers have...
Humanities scholars increasingly rely on digital archives for their research in place of time-consum...
For indexing the content of digitized historical texts, optical character recognition (OCR) errors a...
Humanities scholars increasingly rely on digital archives for their research instead of time-consumi...
The millions of pages of historical documents that are digitized in libraries are increasingly used ...
International audienceCataloguing, indexing, and correcting the OCR of digitized documents, librarie...
Cataloguing, indexing, and correcting the OCR of digitized documents, libraries have often externali...
Digitization of historical documents is a challenging task in many digital humanities projects. A po...
Crowdsourcing approaches for post-correction of OCR output (Optical Character Recognition) have been...
Crowdsourcing approaches for post-correction of OCR output (Optical Character Recognition) have been...
Optical character recognition (OCR) for historical documents is a complex procedure subject to a uni...
In this paper we describe our efforts in reducing and correcting OCR errors in the context of buildi...
A trend to digitize historical paper-based archives has emerged in recent years, with the advent of ...
International audienceThis paper describes the second round of the ICDAR 2019 competition on post-OC...
Digitized document collections often suffer from OCR errors that may impact a document's readability...
Over the past few decades, large archives of paper-based documents such as books and newspapers have...
Humanities scholars increasingly rely on digital archives for their research in place of time-consum...
For indexing the content of digitized historical texts, optical character recognition (OCR) errors a...
Humanities scholars increasingly rely on digital archives for their research instead of time-consumi...
The millions of pages of historical documents that are digitized in libraries are increasingly used ...
International audienceCataloguing, indexing, and correcting the OCR of digitized documents, librarie...
Cataloguing, indexing, and correcting the OCR of digitized documents, libraries have often externali...