For more than a decade, Republican magazines and newspapers have been collected by institutes and projects now joined in the Centre for Asian and Transcultural Studies (CATS) at Heidelberg University. Our platform “Early Chinese Periodicals Online” (ECPO, https://uni-heidelberg.de/ecpo), provides open access to more than 300.000 digital images and their metadata, cf. [6] and [7]. Since the material consists mostly of image scans, the project ran a number of experiments to explore possible approaches towards full text generation [8]. For newspapers printed in Latin scripts much has changed since Rose Holley deemed the use of “’training’” facility (artificial intelligence) in the OCR software” as “Not viable for cost effective mass scale digi...
We propose a new method for an effective removal of the printing artifacts occurring in historical n...
The user expectation from a digitized collection is that a full text search can be performed and tha...
In this thesis we work on recognizing the text in the book ``Rerum Frisicarum Historia'' by Ubbo Emm...
For more than a decade, Republican magazines and newspapers have been collected by institutes and pr...
In our paper we present the first results from a systematic approach to full text extraction from a ...
This work presents methods and results of an initial step towards full text extraction from a Republ...
Abstract (cf. The Book of Abstracts, p. 11-13): The use of convolutional neural networks in digitiz...
Eine Posterpräsentation auf der 8. Tagung des Verbands "Digital Humanities im deutschsprachigen Raum...
Large text corpora are indispensable for natural language processing. However, in various fields suc...
Optical Character Recognition (OCR) is commonly used nowadays for printouts and documents conversion...
We’ve been developing a Chinese OCR engine for machine printed documents. Currently, our OCR engine ...
This article proposes a technique for correcting Chinese OCR errors to support retrieval of scanned ...
Abstract. Many historical newspapers are being digitized. We aim to support access to them via text ...
Current general digitization approach of paper media is converting them into the digital images by a...
We present an early version of a complete Optical Character Recognition (OCR) system for Tamil newsp...
We propose a new method for an effective removal of the printing artifacts occurring in historical n...
The user expectation from a digitized collection is that a full text search can be performed and tha...
In this thesis we work on recognizing the text in the book ``Rerum Frisicarum Historia'' by Ubbo Emm...
For more than a decade, Republican magazines and newspapers have been collected by institutes and pr...
In our paper we present the first results from a systematic approach to full text extraction from a ...
This work presents methods and results of an initial step towards full text extraction from a Republ...
Abstract (cf. The Book of Abstracts, p. 11-13): The use of convolutional neural networks in digitiz...
Eine Posterpräsentation auf der 8. Tagung des Verbands "Digital Humanities im deutschsprachigen Raum...
Large text corpora are indispensable for natural language processing. However, in various fields suc...
Optical Character Recognition (OCR) is commonly used nowadays for printouts and documents conversion...
We’ve been developing a Chinese OCR engine for machine printed documents. Currently, our OCR engine ...
This article proposes a technique for correcting Chinese OCR errors to support retrieval of scanned ...
Abstract. Many historical newspapers are being digitized. We aim to support access to them via text ...
Current general digitization approach of paper media is converting them into the digital images by a...
We present an early version of a complete Optical Character Recognition (OCR) system for Tamil newsp...
We propose a new method for an effective removal of the printing artifacts occurring in historical n...
The user expectation from a digitized collection is that a full text search can be performed and tha...
In this thesis we work on recognizing the text in the book ``Rerum Frisicarum Historia'' by Ubbo Emm...