Although OCR technology is now commonplace, character recognition errors are still a problem, in particular, in automated systems for information extraction from printed documents. This paper proposes a method for the automatic detection and correction of OCR errors in an information extraction system. Our algorithm uses domain-knowledge about possible misrecognition of characters to propose corrections; then it exploits knowledge about the type of the extracted information to perform syntactic and semantic checks in order to validate the proposed corrections. We assess our proposal on a real-world, highly challenging dataset composed of nearly 800 values extracted from approximately 100 commercial invoices and we obtained very good results
Rapid growth in the digitization of documents, such as paper-based invoices or receipts, has allevi...
In this paper, we describe a spelling correction system designed specifically for OCR-generated text...
International audienceThe French National Library (BnF ) has launched many mass digitization project...
This paper describes a new expert system for automatically correcting errors made by optical charact...
This paper describes a new expert system for automatically correcting errors made by optical charact...
Post-OCR is an important processing step that follows optical character recognition (OCR) and is mea...
International audienceWe present an experiment conducted on the automatic spelling correction of tex...
International audienceWe present an experiment conducted on the automatic spelling correction of tex...
International audienceWe present an experiment conducted on the automatic spelling correction of tex...
International audienceWe present an experiment conducted on the automatic spelling correction of tex...
According to Wikipedia, Optical Character Recognition (OCR) “is the mechanical or electronic transla...
International audienceWe present an experiment conducted on the automatic spelling correction of tex...
We offer a perspective on the performance of current OCR systems by illustrating and explaining actu...
Rapid growth in the digitization of documents, such as paper-based invoices or receipts, has allevi...
Large text corpora are indispensable for natural language processing. However, in various fields suc...
Rapid growth in the digitization of documents, such as paper-based invoices or receipts, has allevi...
In this paper, we describe a spelling correction system designed specifically for OCR-generated text...
International audienceThe French National Library (BnF ) has launched many mass digitization project...
This paper describes a new expert system for automatically correcting errors made by optical charact...
This paper describes a new expert system for automatically correcting errors made by optical charact...
Post-OCR is an important processing step that follows optical character recognition (OCR) and is mea...
International audienceWe present an experiment conducted on the automatic spelling correction of tex...
International audienceWe present an experiment conducted on the automatic spelling correction of tex...
International audienceWe present an experiment conducted on the automatic spelling correction of tex...
International audienceWe present an experiment conducted on the automatic spelling correction of tex...
According to Wikipedia, Optical Character Recognition (OCR) “is the mechanical or electronic transla...
International audienceWe present an experiment conducted on the automatic spelling correction of tex...
We offer a perspective on the performance of current OCR systems by illustrating and explaining actu...
Rapid growth in the digitization of documents, such as paper-based invoices or receipts, has allevi...
Large text corpora are indispensable for natural language processing. However, in various fields suc...
Rapid growth in the digitization of documents, such as paper-based invoices or receipts, has allevi...
In this paper, we describe a spelling correction system designed specifically for OCR-generated text...
International audienceThe French National Library (BnF ) has launched many mass digitization project...