this paper we propose to use the Bible as a dataset for comparing OCR accuracy across languages. Besides being available in a wide range of languages, Bible translations are closely parallel in content, carefully translated, surprisingly relevant with respect to modern-day language, and quite inexpensive. A project at the University of Maryland is currently implementing this idea. Wehave created a scanned image dataset with groundtruth from an Arabic Bible. Wehave also used image degradation models to create synthetically degraded images of a FrenchBible. We hope to generate similar Bible datasets for other languages, and we are exploring alternative corpora suchasthe Koran and the Bhagavad Gita that have similar properties. Quantitative O...
While digital libraries based on page images and automat-ically generated text have made possible ma...
This paper argues that in view of the proliferation of English translations of the Quran, a systemat...
Abstract—Extracting knowledge from text documents has become one of the main hot topics in the field...
The Author(s) 2014. This article is published with open access at Springerlink.com Abstract We descr...
The Bible is one of the most widely translated and read books in history available in as many as 125...
This article proposes new method in Bible interpretation by comparing translation products of Bible ...
This contribution to a special issue on “Computer-aided processing of intertextuality” in ancient te...
This study aims at providing scientific arguments against the phenomena that appear in the Bible tra...
This paper reports on a project to annotate biblical texts in order to create an aligned multilingua...
Related data set “BHSA” with URL http://doi.org/10.5281/zenodo.1302798 in repository “Zenodo”. The t...
The user expectation from a digitized collection is that a full text search can be performed and tha...
Over the past years, considerable effort has been put into digitising library collections. As part o...
This paper deals with the role of Biblical referential sources as employed by the translator when cr...
This article proposes new method in Bible interpretation by comparing translation products of Bible ...
An optical character recognition (OCR) refers to a process of converting the text document images in...
While digital libraries based on page images and automat-ically generated text have made possible ma...
This paper argues that in view of the proliferation of English translations of the Quran, a systemat...
Abstract—Extracting knowledge from text documents has become one of the main hot topics in the field...
The Author(s) 2014. This article is published with open access at Springerlink.com Abstract We descr...
The Bible is one of the most widely translated and read books in history available in as many as 125...
This article proposes new method in Bible interpretation by comparing translation products of Bible ...
This contribution to a special issue on “Computer-aided processing of intertextuality” in ancient te...
This study aims at providing scientific arguments against the phenomena that appear in the Bible tra...
This paper reports on a project to annotate biblical texts in order to create an aligned multilingua...
Related data set “BHSA” with URL http://doi.org/10.5281/zenodo.1302798 in repository “Zenodo”. The t...
The user expectation from a digitized collection is that a full text search can be performed and tha...
Over the past years, considerable effort has been put into digitising library collections. As part o...
This paper deals with the role of Biblical referential sources as employed by the translator when cr...
This article proposes new method in Bible interpretation by comparing translation products of Bible ...
An optical character recognition (OCR) refers to a process of converting the text document images in...
While digital libraries based on page images and automat-ically generated text have made possible ma...
This paper argues that in view of the proliferation of English translations of the Quran, a systemat...
Abstract—Extracting knowledge from text documents has become one of the main hot topics in the field...