Column segmentation logically precedes OCR in the document analysis process. The trainable algorithm described here, XYCUT, relies on horizontal and vertical binary profiles to produce an XY- tree representing the column structure of a page of a technical document in a single pass through the bit image. Training against ground truth adjusts a single, resolution independent, parameter using only local information and guided by an edit distance function. The algorithm correctly segments the page image for a (fairly) wide range of parameter values, although small, local and repairable errors may be made, an effect measured by a repair cost function. Keywords: Column segmentation, decolumnization, XY tree, profiles 1 Introduction Column segme...
Abstract—A persistent flaw in the evaluation of page segmentation algorithms is examined. Index Term...
We describe a new approach for evaluating page segmentation algorithms. Unlike techniques that rely ...
a fast speed and robust document image segmentation and classification algorithm based on bottom-up ...
Column Segmentation logically precedes OCR in the document analysis process. The trainable algorithm...
A single-parameter text-line extraction algorithm is described along with an efficient technique for...
A method is presented for the efficient segmentation of text lines from scanned images of technical ...
Page segmentation is an important field to analyse patterns from the OCR Systems. In this paper we t...
This paper describes fast and efficient method for page segmentation of document containing nonrecta...
This paper describes a text-line identification and segmentation technique that is probability based...
We describe a top-down approach to the segmentation and representation of documents containing tabul...
Image thresholding and page segmentation are necessary components of any image understanding and rec...
Alternating horizontal and vertical projection profiles are extracted from nested sub-blocks of scan...
This paper describes a method for extracting words, textlines and text blocks by analyzing the spati...
The goal of this work is to add the capability to segment documents containing text, graphics, and p...
We present a fully automated process to scan the Australian Telecom Yellow Pages and produce a text ...
Abstract—A persistent flaw in the evaluation of page segmentation algorithms is examined. Index Term...
We describe a new approach for evaluating page segmentation algorithms. Unlike techniques that rely ...
a fast speed and robust document image segmentation and classification algorithm based on bottom-up ...
Column Segmentation logically precedes OCR in the document analysis process. The trainable algorithm...
A single-parameter text-line extraction algorithm is described along with an efficient technique for...
A method is presented for the efficient segmentation of text lines from scanned images of technical ...
Page segmentation is an important field to analyse patterns from the OCR Systems. In this paper we t...
This paper describes fast and efficient method for page segmentation of document containing nonrecta...
This paper describes a text-line identification and segmentation technique that is probability based...
We describe a top-down approach to the segmentation and representation of documents containing tabul...
Image thresholding and page segmentation are necessary components of any image understanding and rec...
Alternating horizontal and vertical projection profiles are extracted from nested sub-blocks of scan...
This paper describes a method for extracting words, textlines and text blocks by analyzing the spati...
The goal of this work is to add the capability to segment documents containing text, graphics, and p...
We present a fully automated process to scan the Australian Telecom Yellow Pages and produce a text ...
Abstract—A persistent flaw in the evaluation of page segmentation algorithms is examined. Index Term...
We describe a new approach for evaluating page segmentation algorithms. Unlike techniques that rely ...
a fast speed and robust document image segmentation and classification algorithm based on bottom-up ...