Millions of books from public libraries and private collections have been scanned by various organizations in the last decade. The motivation is to preserve the written human heritage in electronic format for durable storage and efficient access. The information buried in these large book collections has always been of major interest for scholars from various disciplines. Several interesting research problems can be defined over large collections of scanned books given their corresponding optical character recognition (OCR) outputs. At the highest level, one can view the entire collection as a whole and discover interesting contextual relationships or linkages between the books. A more traditional approach is to consider each scanned book s...
The aim of this work is to propose a new approach to the recognition of historical texts by providin...
International audienceThis article describes the work performed in the Pattern Redundancy Analysis f...
Conventional optical character recognition (OCR) systems operate on individual characters and words,...
This paper describes an approach for identifying translations of books in large scanned book collect...
A framework is presented for discovering partial duplicates in large collections of scanned books wi...
This paper aims to evaluate the accuracy of optical character recognition (OCR) systems on real scan...
A number of projects are creating searchable digital libraries of printed books. These include the M...
Abstract—This paper evaluates an automated scheme for aligning and combining optical character recog...
This paper evaluates an automated scheme for aligning and combining optical character recognition (O...
A number of projects are creating searchable digital libraries of printed books. These include the M...
Whole-book recognition is a document image analysis strategy that operates on the complete set of a ...
We present a novel general method for discovering similar passages within large text documents based...
We present a novel general method for discovering similar passages within large text documents based...
An efficient word spotting framework is proposed to search text in scanned books. The proposed metho...
Many large collections of full-text documents are currently stored in machine-readable form and pro...
The aim of this work is to propose a new approach to the recognition of historical texts by providin...
International audienceThis article describes the work performed in the Pattern Redundancy Analysis f...
Conventional optical character recognition (OCR) systems operate on individual characters and words,...
This paper describes an approach for identifying translations of books in large scanned book collect...
A framework is presented for discovering partial duplicates in large collections of scanned books wi...
This paper aims to evaluate the accuracy of optical character recognition (OCR) systems on real scan...
A number of projects are creating searchable digital libraries of printed books. These include the M...
Abstract—This paper evaluates an automated scheme for aligning and combining optical character recog...
This paper evaluates an automated scheme for aligning and combining optical character recognition (O...
A number of projects are creating searchable digital libraries of printed books. These include the M...
Whole-book recognition is a document image analysis strategy that operates on the complete set of a ...
We present a novel general method for discovering similar passages within large text documents based...
We present a novel general method for discovering similar passages within large text documents based...
An efficient word spotting framework is proposed to search text in scanned books. The proposed metho...
Many large collections of full-text documents are currently stored in machine-readable form and pro...
The aim of this work is to propose a new approach to the recognition of historical texts by providin...
International audienceThis article describes the work performed in the Pattern Redundancy Analysis f...
Conventional optical character recognition (OCR) systems operate on individual characters and words,...