Alternating horizontal and vertical projection profiles are extracted from nested sub-blocks of scanned page images of technical documents. The thresholded profile strings are parsed using the compiler utilities Lex and Yacc. The significant document components are demarcated and identified by the recursive application of block grammars. Backtracking for error recovery and branch and bound for maximum-area labeling are implemented with Unix Shell programs. Results of the segmentation and labeling process are stored in a labeled X-Y tree. It is shown that families of technical documents that share the same layout conventions can be readily analyzed. More than 20 types of document entities can be identified in sample pages from the IBM Journa...
Document layout segmentation and recognition is an important task in the creation of digitized docum...
We describe a top-down approach to the segmentation and representation of documents containing tabul...
International audienceNewspapers are documents made of news item and informative articles. They are ...
Alternating horizontal and vertical projection profiles are extracted from nested sub-blocks of scan...
Intelligent document segmentation can bring electronic browsing within the reach of most users. The ...
A method is presented for the efficient segmentation of text lines from scanned images of technical ...
With the advent of more powerful personal computers, inexpensive memory, and digital cameras, curato...
Column Segmentation logically precedes OCR in the document analysis process. The trainable algorithm...
The main objective of this thesis is to develop a system to automatically segment and label a variet...
Digitization of newspapers is of interest for many reasons including preservation of history, access...
The goal of this work is to add the capability to segment documents containing text, graphics, and p...
Abstract — Digitization of paper-bound documents is one of the foremost commercial interests worldwi...
This paper describes an algorithm for document representation in a reduced vectorial space by a proc...
We present a general approach for the hierarchical segmentation and labeling of document layout stru...
In this paper, as a first step to an easy and convenient way to access the manuscripts of Atatürk wi...
Document layout segmentation and recognition is an important task in the creation of digitized docum...
We describe a top-down approach to the segmentation and representation of documents containing tabul...
International audienceNewspapers are documents made of news item and informative articles. They are ...
Alternating horizontal and vertical projection profiles are extracted from nested sub-blocks of scan...
Intelligent document segmentation can bring electronic browsing within the reach of most users. The ...
A method is presented for the efficient segmentation of text lines from scanned images of technical ...
With the advent of more powerful personal computers, inexpensive memory, and digital cameras, curato...
Column Segmentation logically precedes OCR in the document analysis process. The trainable algorithm...
The main objective of this thesis is to develop a system to automatically segment and label a variet...
Digitization of newspapers is of interest for many reasons including preservation of history, access...
The goal of this work is to add the capability to segment documents containing text, graphics, and p...
Abstract — Digitization of paper-bound documents is one of the foremost commercial interests worldwi...
This paper describes an algorithm for document representation in a reduced vectorial space by a proc...
We present a general approach for the hierarchical segmentation and labeling of document layout stru...
In this paper, as a first step to an easy and convenient way to access the manuscripts of Atatürk wi...
Document layout segmentation and recognition is an important task in the creation of digitized docum...
We describe a top-down approach to the segmentation and representation of documents containing tabul...
International audienceNewspapers are documents made of news item and informative articles. They are ...