This work introduces a practical method for performing logical layout analysis on heterogeneous periodical collections. The described module is incorporated into the Fraunhofer document image understanding system and has been successfully used as part of mass digitization projects on more than 500 000 scanned pages. Our primary target are documents with complex layouts such as newspapers, however the described methods can easily be adapted to non-periodical publications. While encouraging, experimental results obtained on a heterogeneous set of digitized newspaper and chronicle pages spanning about 70 years reflect the high complexity of the generic, automated layout analysis problem. Our results allow the identification of promising areas ...
The automated discovery of logical structure in text documents is an important problem that has rece...
Document image understanding refers to logical and semantic analysis of document images in order to ...
A vast amount of digital document material is continuously being produced as part of major digitizat...
The availability of large, heterogeneous repositories of electronic documents is increasing rapidly,...
We present a fully implemented system based on generic document knowledge for detecting the logical ...
In recent years, libraries and archives led important digitisation campaigns that opened the access ...
International audienceBackground. In recent years, libraries and archives led important digitisation...
Background. In recent years, libraries and archives led importantdigitisation campaigns that opened ...
International audienceNewspapers are documents made of news item and informative articles. They are ...
Background. In recent years, libraries and archives led important digitisation campaigns that opened...
An important aspect of document understanding is document logical structure derivation, which involv...
In this paper we present and discuss a novel approach to modeling logical structures of documents, b...
The current spread of digital documents raised the need of effective content-based retrieval techni...
Dataset for Logical-layout analysis on French Historical Newspapers This is a dataset for training...
National audienceDocument Analysis and Recognition consist in translating their images into an elect...
The automated discovery of logical structure in text documents is an important problem that has rece...
Document image understanding refers to logical and semantic analysis of document images in order to ...
A vast amount of digital document material is continuously being produced as part of major digitizat...
The availability of large, heterogeneous repositories of electronic documents is increasing rapidly,...
We present a fully implemented system based on generic document knowledge for detecting the logical ...
In recent years, libraries and archives led important digitisation campaigns that opened the access ...
International audienceBackground. In recent years, libraries and archives led important digitisation...
Background. In recent years, libraries and archives led importantdigitisation campaigns that opened ...
International audienceNewspapers are documents made of news item and informative articles. They are ...
Background. In recent years, libraries and archives led important digitisation campaigns that opened...
An important aspect of document understanding is document logical structure derivation, which involv...
In this paper we present and discuss a novel approach to modeling logical structures of documents, b...
The current spread of digital documents raised the need of effective content-based retrieval techni...
Dataset for Logical-layout analysis on French Historical Newspapers This is a dataset for training...
National audienceDocument Analysis and Recognition consist in translating their images into an elect...
The automated discovery of logical structure in text documents is an important problem that has rece...
Document image understanding refers to logical and semantic analysis of document images in order to ...
A vast amount of digital document material is continuously being produced as part of major digitizat...