This paper evaluates an automated scheme for aligning and combining optical character recognition (OCR) output from three scans of a book to generate a composite version with fewer OCR errors. While there has been some previous work on aligning multiple OCR versions of the same scan, the scheme introduced in this paper does not require that scans be from the same copy of the book, or even the same edition. The three OCR outputs are combined using an algorithm which builds upon an technique which aligns two sequences at a time. In the algorithm a multiple sequence alignment of the scans is generated by zipping together pairwise alignments and is used in turn to construct a corrected text. The algorithm is able to remove OCR errors so long as...
In this paper we propose an algorithm which improves the quality of the digital version of antique b...
International audienceSince 2006 the national library of France (BnF) has developed many mass digiti...
Post-OCR is an important processing step that follows optical character recognition (OCR) and is mea...
Abstract—This paper evaluates an automated scheme for aligning and combining optical character recog...
This paper aims to evaluate the accuracy of optical character recognition (OCR) systems on real scan...
A number of projects are creating searchable digital libraries of printed books. These include the M...
This paper describes a new expert system for automatically correcting errors made by optical charact...
This paper describes a new expert system for automatically correcting errors made by optical charact...
Millions of books from public libraries and private collections have been scanned by various organiz...
The user expectation from a digitized collection is that a full text search can be performed and tha...
One of the goals of the Aktienführer-Datenarchiv project is to process data from the Aktienführer an...
One of the goals of the Aktienführer-Datenarchiv project is to process data from the Aktienführer an...
According to Wikipedia, Optical Character Recognition (OCR) “is the mechanical or electronic transla...
Περιέχει το πλήρες κείμενοThis paper describes a work-flow designed to populate a digital library o...
International audienceSince 2006 the national library of France (BnF) has developed many mass digiti...
In this paper we propose an algorithm which improves the quality of the digital version of antique b...
International audienceSince 2006 the national library of France (BnF) has developed many mass digiti...
Post-OCR is an important processing step that follows optical character recognition (OCR) and is mea...
Abstract—This paper evaluates an automated scheme for aligning and combining optical character recog...
This paper aims to evaluate the accuracy of optical character recognition (OCR) systems on real scan...
A number of projects are creating searchable digital libraries of printed books. These include the M...
This paper describes a new expert system for automatically correcting errors made by optical charact...
This paper describes a new expert system for automatically correcting errors made by optical charact...
Millions of books from public libraries and private collections have been scanned by various organiz...
The user expectation from a digitized collection is that a full text search can be performed and tha...
One of the goals of the Aktienführer-Datenarchiv project is to process data from the Aktienführer an...
One of the goals of the Aktienführer-Datenarchiv project is to process data from the Aktienführer an...
According to Wikipedia, Optical Character Recognition (OCR) “is the mechanical or electronic transla...
Περιέχει το πλήρες κείμενοThis paper describes a work-flow designed to populate a digital library o...
International audienceSince 2006 the national library of France (BnF) has developed many mass digiti...
In this paper we propose an algorithm which improves the quality of the digital version of antique b...
International audienceSince 2006 the national library of France (BnF) has developed many mass digiti...
Post-OCR is an important processing step that follows optical character recognition (OCR) and is mea...