Commercial OCR packages work best with highquality scanned images. They often produce poor results when the image is degraded, either because the original itself was poor quality, or because of excessive photocopying. The ability to predict the word failure rate of OCR from a statistical analysis of the image can help in making decisions in the trade-off between the success rate of OCR and the cost of human correction of errors. This paper describes an investigation of OCR of degraded text images using a standard OCR engine (Adobe Capture). The documents were selected from those in the archive at Los Alamos National Laboratory. By introducing noise in a controlled manner into perfect documents, we show how the quality of OCR can be predicte...
Is an average OCR quality of 70% enough for my study? What OCR quality should we ask from external s...
Humanities scholars increasingly rely on digital archives for their research instead of time-consumi...
Is an average OCR quality of 70% enough for my study? What OCR quality should we ask from external s...
Clean documents are relatively easy to recognize. However, when digitizing collections of documents,...
Clean documents are relatively easy to recognize. However, when digitizing collections of documents,...
Clean documents are relatively easy to recognize. However, when digitizing collections of documents,...
OCR often performs poorly on degraded documents. One approach to improving performance is to determi...
Clean documents are relatively easy to recognize. However, when digitizing collections of documents,...
OCR often performs poorly on degraded documents. One approach to improving performance is to determi...
Clean documents are relatively easy to recognize. How-ever, when digitizing collections of documents...
Over the past years, considerable effort has been put into digitising library collections. As part o...
The user expectation from a digitized collection is that a full text search can be performed and tha...
It is desirable to convert paper text documents to a computer readable and searchable form. For curr...
Mass digitization of historical documents is a challenging problem for optical character recognition...
htmlabstractHumanities scholars increasingly rely on digital archives for their research in place of...
Is an average OCR quality of 70% enough for my study? What OCR quality should we ask from external s...
Humanities scholars increasingly rely on digital archives for their research instead of time-consumi...
Is an average OCR quality of 70% enough for my study? What OCR quality should we ask from external s...
Clean documents are relatively easy to recognize. However, when digitizing collections of documents,...
Clean documents are relatively easy to recognize. However, when digitizing collections of documents,...
Clean documents are relatively easy to recognize. However, when digitizing collections of documents,...
OCR often performs poorly on degraded documents. One approach to improving performance is to determi...
Clean documents are relatively easy to recognize. However, when digitizing collections of documents,...
OCR often performs poorly on degraded documents. One approach to improving performance is to determi...
Clean documents are relatively easy to recognize. How-ever, when digitizing collections of documents...
Over the past years, considerable effort has been put into digitising library collections. As part o...
The user expectation from a digitized collection is that a full text search can be performed and tha...
It is desirable to convert paper text documents to a computer readable and searchable form. For curr...
Mass digitization of historical documents is a challenging problem for optical character recognition...
htmlabstractHumanities scholars increasingly rely on digital archives for their research in place of...
Is an average OCR quality of 70% enough for my study? What OCR quality should we ask from external s...
Humanities scholars increasingly rely on digital archives for their research instead of time-consumi...
Is an average OCR quality of 70% enough for my study? What OCR quality should we ask from external s...