The goal of this work is to develop statistical natural language models and processing techniques based on Recurrent Neural Networks (RNN), especially the recently introduced Long Short- Term Memory (LSTM). Due to their adapting and predicting abilities, these methods are more robust, and easier to train than traditional methods, i.e., words list and rule-based models. They improve the output of recognition systems and make them more accessible to users for browsing and reading. These techniques are required, especially for historical books which might take years of effort and huge costs to manually transcribe them. The contributions of this thesis are several new methods which have high-performance computing and accuracy. First, an ...
The lack of a spelling convention in historical documents makes their orthography to change dependin...
A trend to digitize historical paper-based archives has emerged in recent years, with the advent of ...
A trend to digitize historical paper-based archives has emerged in recent years, with the advent of ...
Optical character recognition (OCR) is crucial for a deeper access to historical collections. OCR ne...
For indexing the content of digitized historical texts, optical character recognition (OCR) errors a...
For indexing the content of digitized historical texts, optical character recognition (OCR) errors a...
The task of printed Optical Character Recognition (OCR), though considered ``solved'' by many, stil...
The digitization of historical handwritten document images is important for the preservation of cult...
Over the past few decades, large archives of paper-based documents such as books and newspapers have...
© ACM 2013. This is the author's version of the work. It is posted here for your personal use. Not ...
International audienceIn this paper we present a novel approach to the automatic correction of OCR-i...
Interdisciplinary collaboration between two faculty members in the humanities and computer science, ...
Interdisciplinary collaboration between two faculty members in the humanities and computer science, ...
© ACM 2013. This is the author's version of the work. It is posted here for your personal use. Not ...
Historische Dokumente werden zunehmend in digitalisierter Form verfügbar gemacht. Häufig sind sie je...
The lack of a spelling convention in historical documents makes their orthography to change dependin...
A trend to digitize historical paper-based archives has emerged in recent years, with the advent of ...
A trend to digitize historical paper-based archives has emerged in recent years, with the advent of ...
Optical character recognition (OCR) is crucial for a deeper access to historical collections. OCR ne...
For indexing the content of digitized historical texts, optical character recognition (OCR) errors a...
For indexing the content of digitized historical texts, optical character recognition (OCR) errors a...
The task of printed Optical Character Recognition (OCR), though considered ``solved'' by many, stil...
The digitization of historical handwritten document images is important for the preservation of cult...
Over the past few decades, large archives of paper-based documents such as books and newspapers have...
© ACM 2013. This is the author's version of the work. It is posted here for your personal use. Not ...
International audienceIn this paper we present a novel approach to the automatic correction of OCR-i...
Interdisciplinary collaboration between two faculty members in the humanities and computer science, ...
Interdisciplinary collaboration between two faculty members in the humanities and computer science, ...
© ACM 2013. This is the author's version of the work. It is posted here for your personal use. Not ...
Historische Dokumente werden zunehmend in digitalisierter Form verfügbar gemacht. Häufig sind sie je...
The lack of a spelling convention in historical documents makes their orthography to change dependin...
A trend to digitize historical paper-based archives has emerged in recent years, with the advent of ...
A trend to digitize historical paper-based archives has emerged in recent years, with the advent of ...