In this article, we show how some concepts found in traditional and old layout practices used to layout text (ruling, grid) can improve document digitization. We will first present these basic layout methods, some used since the Antiquity, and explain how some of their key concepts can be ‘translated ’ and used in today’s document digitization. In particular, we will show that the traditional concept of type area is a key notion for modeling document layout. An algorithm to compute type area is detailed. We will then illustrate this work with several practical usages and evaluations, from OCR improvement to high-level logical segmentation
This paper presents an eficient technique for doc-ument page layout structure extraction and classif...
In this paper, a machine learning approach to support the user during the correction of the layout a...
In this paper, we propose a new dataset and a ground-truthing methodology for layout analysis of his...
Abstract — Digitization of paper-bound documents is one of the foremost commercial interests worldwi...
Precise description of layout entities (content regions on a page) is crucial for all but the most t...
Using examples of implemented layout decisions, and from performance-based research on the effects t...
A document image is composed of a variety of physical entities or regions such as text blocks, lines...
Digitization of newspapers is of interest for many reasons including preservation of history, access...
The availability of large, heterogeneous repositories of electronic documents is increasing rapidly,...
We present and analyze efficient algorithms for the automated recognition and interpretation of layo...
Document layout segmentation and recognition is an important task in the creation of digitized docum...
International audienceThis article describes the work performed in the Pattern Redundancy Analysis f...
layout, functional programming Highly customised variable-data documents make automatic layout of th...
Background. Nowadays, information retrieval system become more and more popular, it helps people ret...
The flow of information is continually increasing due to the ubiquitous use of information technolog...
This paper presents an eficient technique for doc-ument page layout structure extraction and classif...
In this paper, a machine learning approach to support the user during the correction of the layout a...
In this paper, we propose a new dataset and a ground-truthing methodology for layout analysis of his...
Abstract — Digitization of paper-bound documents is one of the foremost commercial interests worldwi...
Precise description of layout entities (content regions on a page) is crucial for all but the most t...
Using examples of implemented layout decisions, and from performance-based research on the effects t...
A document image is composed of a variety of physical entities or regions such as text blocks, lines...
Digitization of newspapers is of interest for many reasons including preservation of history, access...
The availability of large, heterogeneous repositories of electronic documents is increasing rapidly,...
We present and analyze efficient algorithms for the automated recognition and interpretation of layo...
Document layout segmentation and recognition is an important task in the creation of digitized docum...
International audienceThis article describes the work performed in the Pattern Redundancy Analysis f...
layout, functional programming Highly customised variable-data documents make automatic layout of th...
Background. Nowadays, information retrieval system become more and more popular, it helps people ret...
The flow of information is continually increasing due to the ubiquitous use of information technolog...
This paper presents an eficient technique for doc-ument page layout structure extraction and classif...
In this paper, a machine learning approach to support the user during the correction of the layout a...
In this paper, we propose a new dataset and a ground-truthing methodology for layout analysis of his...