A number of projects are creating searchable digital libraries of printed books. These include the Million Book Project, the Google Book project and similar efforts from Yahoo and Microsoft. Content-based on line book retrieval usually requires first converting printed text into machine readable (e.g. ASCII) text using an optical character recognition (OCR) engine and then doing full text search on the results. Many of these books are old and there are a variety of processing steps that are required to create an end to end system. Changing any step (including the scanning process) can affect OCR performance and hence a good automatic statistical evaluation of OCR performance on book length material is needed. Evaluating OCR performance on t...
This paper describes an approach for identifying translations of books in large scanned book collect...
In large-scale digitization processes, several com-mon tasks are performed to provide an electronic ...
An efficient word spotting framework is proposed to search text in scanned books. The proposed metho...
A number of projects are creating searchable digital libraries of printed books. These include the M...
A number of projects are creating searchable digital libraries of printed books. These include the M...
This paper aims to evaluate the accuracy of optical character recognition (OCR) systems on real scan...
Abstract—This paper evaluates an automated scheme for aligning and combining optical character recog...
This paper evaluates an automated scheme for aligning and combining optical character recognition (O...
Millions of books from public libraries and private collections have been scanned by various organiz...
In this paper, we present the implementation and evaluation of first order and second order Hidden M...
In this paper, we present the implementation and evaluation of first order and second order Hidden M...
International audienceSince 2006 the national library of France (BnF) has developed many mass digiti...
International audienceSince 2006 the national library of France (BnF) has developed many mass digiti...
Whole-book recognition is a document image analysis strategy that operates on the complete set of a ...
The aim of this work is to propose a new approach to the recognition of historical texts by providin...
This paper describes an approach for identifying translations of books in large scanned book collect...
In large-scale digitization processes, several com-mon tasks are performed to provide an electronic ...
An efficient word spotting framework is proposed to search text in scanned books. The proposed metho...
A number of projects are creating searchable digital libraries of printed books. These include the M...
A number of projects are creating searchable digital libraries of printed books. These include the M...
This paper aims to evaluate the accuracy of optical character recognition (OCR) systems on real scan...
Abstract—This paper evaluates an automated scheme for aligning and combining optical character recog...
This paper evaluates an automated scheme for aligning and combining optical character recognition (O...
Millions of books from public libraries and private collections have been scanned by various organiz...
In this paper, we present the implementation and evaluation of first order and second order Hidden M...
In this paper, we present the implementation and evaluation of first order and second order Hidden M...
International audienceSince 2006 the national library of France (BnF) has developed many mass digiti...
International audienceSince 2006 the national library of France (BnF) has developed many mass digiti...
Whole-book recognition is a document image analysis strategy that operates on the complete set of a ...
The aim of this work is to propose a new approach to the recognition of historical texts by providin...
This paper describes an approach for identifying translations of books in large scanned book collect...
In large-scale digitization processes, several com-mon tasks are performed to provide an electronic ...
An efficient word spotting framework is proposed to search text in scanned books. The proposed metho...