International audienceThe French National Library (BnF ) has launched many mass digitization projects in order to give access to its collection. The indexation of digital documents on Gallica (digital library of the BnF) is done through their textual content obtained thanks to service providers that use Optical Character Recognition softwares (OCR). OCR softwares have become increasingly complex systems composed of several subsystems dedicated to the analysis and the recognition of the elements in a page. However, the reliability of these systems is always an issue at stake. Indeed, in some cases, we can nd errors in OCR outputs that occur because of an accumulation of several errors at di erent levels in the OCR process. One of the frequen...
Commercial OCR packages work best with highquality scanned images. They often produce poor results w...
The millions of pages of historical documents that are digitized in libraries are increasingly used ...
International audienceThe work reported in this paper aims at performance optimization in the digiti...
International audienceThe French National Library (BnF ) has launched many mass digitization project...
This work focuses on the assessment of characters recognition results produced automatically by opti...
The user expectation from a digitized collection is that a full text search can be performed and tha...
Post-OCR is an important processing step that follows optical character recognition (OCR) and is mea...
The user expectation from a digitized collection is that a full text search can be performed and tha...
Mass digitization of historical documents is a challenging problem for optical character recognition...
The accuracy of Optical Character Recognition (OCR) technologies considerably impacts the way digita...
Event detection (ED) is a crucial task for natural language processing (NLP) and it involves the ide...
This paper describes a new expert system for automatically correcting errors made by optical charact...
This paper describes a new expert system for automatically correcting errors made by optical charact...
Post processing is the most conventional approach for correcting errors that are caused by Optical C...
Over the past years, considerable effort has been put into digitising library collections. As part o...
Commercial OCR packages work best with highquality scanned images. They often produce poor results w...
The millions of pages of historical documents that are digitized in libraries are increasingly used ...
International audienceThe work reported in this paper aims at performance optimization in the digiti...
International audienceThe French National Library (BnF ) has launched many mass digitization project...
This work focuses on the assessment of characters recognition results produced automatically by opti...
The user expectation from a digitized collection is that a full text search can be performed and tha...
Post-OCR is an important processing step that follows optical character recognition (OCR) and is mea...
The user expectation from a digitized collection is that a full text search can be performed and tha...
Mass digitization of historical documents is a challenging problem for optical character recognition...
The accuracy of Optical Character Recognition (OCR) technologies considerably impacts the way digita...
Event detection (ED) is a crucial task for natural language processing (NLP) and it involves the ide...
This paper describes a new expert system for automatically correcting errors made by optical charact...
This paper describes a new expert system for automatically correcting errors made by optical charact...
Post processing is the most conventional approach for correcting errors that are caused by Optical C...
Over the past years, considerable effort has been put into digitising library collections. As part o...
Commercial OCR packages work best with highquality scanned images. They often produce poor results w...
The millions of pages of historical documents that are digitized in libraries are increasingly used ...
International audienceThe work reported in this paper aims at performance optimization in the digiti...