This work focuses on the assessment of characters recognition results produced automatically by optical character recognition software (OCR on mass digitization projects. The goal is to design a global control system robust enough to deal with BnF documents collection. This collection includes old documents which are difficult to be treated by OCR. We designed a word detection system to detect missed words defects in OCR results, and a words recognition rate estimator to assess the quality of word recognition results performed by OCR.We create two kinds of descriptors to characterize OCR outputs. Image descriptors to characterize page segmentation results and cross alignment descriptors to characterize the quality of word recognition result...
We consider a model for which it is important, early in proces sing, to estimate some variables with...
The millions of pages of historical documents that are digitized in libraries are increasingly used ...
We consider a model for which it is important, early in proces sing, to estimate some variables with...
International audienceThe French National Library (BnF ) has launched many mass digitization project...
International audienceThe French National Library (BnF ) has launched many mass digitization project...
The user expectation from a digitized collection is that a full text search can be performed and tha...
International audienceSince 2006 the national library of France (BnF) has developed many mass digiti...
International audienceSince 2006 the national library of France (BnF) has developed many mass digiti...
The user expectation from a digitized collection is that a full text search can be performed and tha...
Born-analog documents contain enormous knowledge which is valuable to our society. For the purpose o...
International audienceWe present an experiment conducted on the automatic spelling correction of tex...
International audienceWe present an experiment conducted on the automatic spelling correction of tex...
International audienceWe present an experiment conducted on the automatic spelling correction of tex...
International audienceWe present an experiment conducted on the automatic spelling correction of tex...
International audienceWe present an experiment conducted on the automatic spelling correction of tex...
We consider a model for which it is important, early in proces sing, to estimate some variables with...
The millions of pages of historical documents that are digitized in libraries are increasingly used ...
We consider a model for which it is important, early in proces sing, to estimate some variables with...
International audienceThe French National Library (BnF ) has launched many mass digitization project...
International audienceThe French National Library (BnF ) has launched many mass digitization project...
The user expectation from a digitized collection is that a full text search can be performed and tha...
International audienceSince 2006 the national library of France (BnF) has developed many mass digiti...
International audienceSince 2006 the national library of France (BnF) has developed many mass digiti...
The user expectation from a digitized collection is that a full text search can be performed and tha...
Born-analog documents contain enormous knowledge which is valuable to our society. For the purpose o...
International audienceWe present an experiment conducted on the automatic spelling correction of tex...
International audienceWe present an experiment conducted on the automatic spelling correction of tex...
International audienceWe present an experiment conducted on the automatic spelling correction of tex...
International audienceWe present an experiment conducted on the automatic spelling correction of tex...
International audienceWe present an experiment conducted on the automatic spelling correction of tex...
We consider a model for which it is important, early in proces sing, to estimate some variables with...
The millions of pages of historical documents that are digitized in libraries are increasingly used ...
We consider a model for which it is important, early in proces sing, to estimate some variables with...