We present an OCR ground truth data set for historical prints and show improvement of recognition results over baselines with training on this data. We reflect on reusability of the ground truth data set based on two experiments that look into the legal basis for reuse of digitized document images in the case of 19th century English and German books. We propose a framework for publishing ground truth data even when digitized document images cannot be easily redistributed
© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for a...
This set consists 7498 volumes, published between 1860-1869. The dataset comprises text from the col...
htmlabstractHumanities scholars increasingly rely on digital archives for their research in place of...
International audienceWe present an OCR ground truth data set for historical prints and show improve...
The National Library of Finland (NLF) has digitized historical newspapers, journals and ephemera pub...
This dataset comprises 74 digitised images (TIFF files) drawn from a selection of early printed Beng...
GT4HistOCR contains ground truth for research in Optical Character Recognition (OCR) technology appl...
This dataset comprises 81 digitised images (TIFF files) drawn from a selection of early printed Beng...
This dataset contains 50 pages of ground truth data for digitized historical newspapers from the Ber...
Using annotation software provided through the Transkribus Platform we annotated scans, concerning m...
This paper discusses best practices for sharing and reusing Ground Truth in Handwritten Text Recogni...
Recent advances in Optical Character Recognition (OCR) and Handwritten Text Recognition (HTR) have l...
The republic print dataset consists of 107 ground truthed scans Using annotation software provided...
Over the past years, considerable effort has been put into digitising library collections. As part o...
The National Library of Finland (NLF) has digitized historical newspapers, journals and ephemera pub...
© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for a...
This set consists 7498 volumes, published between 1860-1869. The dataset comprises text from the col...
htmlabstractHumanities scholars increasingly rely on digital archives for their research in place of...
International audienceWe present an OCR ground truth data set for historical prints and show improve...
The National Library of Finland (NLF) has digitized historical newspapers, journals and ephemera pub...
This dataset comprises 74 digitised images (TIFF files) drawn from a selection of early printed Beng...
GT4HistOCR contains ground truth for research in Optical Character Recognition (OCR) technology appl...
This dataset comprises 81 digitised images (TIFF files) drawn from a selection of early printed Beng...
This dataset contains 50 pages of ground truth data for digitized historical newspapers from the Ber...
Using annotation software provided through the Transkribus Platform we annotated scans, concerning m...
This paper discusses best practices for sharing and reusing Ground Truth in Handwritten Text Recogni...
Recent advances in Optical Character Recognition (OCR) and Handwritten Text Recognition (HTR) have l...
The republic print dataset consists of 107 ground truthed scans Using annotation software provided...
Over the past years, considerable effort has been put into digitising library collections. As part o...
The National Library of Finland (NLF) has digitized historical newspapers, journals and ephemera pub...
© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for a...
This set consists 7498 volumes, published between 1860-1869. The dataset comprises text from the col...
htmlabstractHumanities scholars increasingly rely on digital archives for their research in place of...