Is an average OCR quality of 70% enough for my study? What OCR quality should we ask from external suppliers? Should we re-do the OCR of our collections to bring it from 80% to 85%? Libraries and researchers alike face the same dilemma in our times of textual abundance: when is OCR quality good enough? User access, scientific results and the investment of limited resources increasingly depend on answering this question. This project focuses on a comprehensive assessment of the impact of OCR quality in Dutch newspaper, journal and book collections, comparing it with published results for English and French. This is be done via extrinsic evaluation: assessing results from a set of representative downstream tasks, such as text classification ...
This document notes most of the research I had done for the National Library of the Netherlands (Kon...
Commercial OCR packages work best with highquality scanned images. They often produce poor results w...
OCR often performs poorly on degraded documents. One approach to improving performance is to determi...
Is an average OCR quality of 70% enough for my study? What OCR quality should we ask from external s...
Over the past years, considerable effort has been put into digitising library collections. As part o...
The user expectation from a digitized collection is that a full text search can be performed and tha...
We conduct an assessment of the impact of OCR quality in collections in Dutch, considering two tasks...
The user expectation from a digitized collection is that a full text search can be performed and tha...
Iterating with new and improved OCR solutions enforces decision making when it comes to targeting th...
The millions of pages of historical documents that are digitized in libraries are increasingly used ...
Humanities scholars increasingly rely on digital archives for their research instead of time-consumi...
ABSTRACT Historical newspapers are increasingly accessed digitally for different purposes both by p...
htmlabstractHumanities scholars increasingly rely on digital archives for their research in place of...
The study of texts using a qualitative approach remains the dominant modus operandi in humanities re...
We propose a set of metrics that evaluate the\ud uniformity, sharpness, continuity, noise, stroke wi...
This document notes most of the research I had done for the National Library of the Netherlands (Kon...
Commercial OCR packages work best with highquality scanned images. They often produce poor results w...
OCR often performs poorly on degraded documents. One approach to improving performance is to determi...
Is an average OCR quality of 70% enough for my study? What OCR quality should we ask from external s...
Over the past years, considerable effort has been put into digitising library collections. As part o...
The user expectation from a digitized collection is that a full text search can be performed and tha...
We conduct an assessment of the impact of OCR quality in collections in Dutch, considering two tasks...
The user expectation from a digitized collection is that a full text search can be performed and tha...
Iterating with new and improved OCR solutions enforces decision making when it comes to targeting th...
The millions of pages of historical documents that are digitized in libraries are increasingly used ...
Humanities scholars increasingly rely on digital archives for their research instead of time-consumi...
ABSTRACT Historical newspapers are increasingly accessed digitally for different purposes both by p...
htmlabstractHumanities scholars increasingly rely on digital archives for their research in place of...
The study of texts using a qualitative approach remains the dominant modus operandi in humanities re...
We propose a set of metrics that evaluate the\ud uniformity, sharpness, continuity, noise, stroke wi...
This document notes most of the research I had done for the National Library of the Netherlands (Kon...
Commercial OCR packages work best with highquality scanned images. They often produce poor results w...
OCR often performs poorly on degraded documents. One approach to improving performance is to determi...