Document Structure Analysis and Performance Evaluation by Jisheng Liang Chair of Supervisory Committee Professor Robert M. Haralick Electrical Engineering The goal of the document structure analysis is to find an optimal solution to partition the set of glyphs on a given document to a hierarchical tree structure where entities within the hierarchy are associated with their physical properties and semantic labels. In this dissertation, we present a unified document structure extraction algorithm that is probability based, where the probabilities are estimated from an extensive training set of various kinds of measurements of distances between the terminal and non-terminal entities with which the algorithm works. The off-line probabilities es...
Notre étude se focalise sur l'un des verrous technologiques qui freinent l'industralisation de systè...
Abstract. In this paper, we define the table detection problem as a probability optimization problem...
This paper describes a text-line identification and seg-mentation technique that is probability base...
This paper presents a performance metric for the document structure extraction algorithms by finding...
This paper describes a text-line identification and segmentation technique that is probability based...
The paper presents a hierarchical object recognition system for document processing. It is based on ...
The availability of large, heterogeneous repositories of electronic documents is increasing rapidly,...
In this paper we present and discuss a novel approach to modeling logical structures of documents, b...
We present a general approach for the hierarchical segmentation and labeling of document layout stru...
Abstract—Image segmentation is an important component of any document image analysis system. While m...
We propose an approach for information extraction for multi-page printed document understanding. The...
This paper presents a table structure understanding algorithm designed using optimization methods. T...
Based on these observations and analysis, we propose a joint discriminative probabilistic framework...
Nowadays PDF documents have become a dominating knowledge repository for both the academia and indus...
Repetition of layout structure is prevalent in document im-ages. In document design, such repetition...
Notre étude se focalise sur l'un des verrous technologiques qui freinent l'industralisation de systè...
Abstract. In this paper, we define the table detection problem as a probability optimization problem...
This paper describes a text-line identification and seg-mentation technique that is probability base...
This paper presents a performance metric for the document structure extraction algorithms by finding...
This paper describes a text-line identification and segmentation technique that is probability based...
The paper presents a hierarchical object recognition system for document processing. It is based on ...
The availability of large, heterogeneous repositories of electronic documents is increasing rapidly,...
In this paper we present and discuss a novel approach to modeling logical structures of documents, b...
We present a general approach for the hierarchical segmentation and labeling of document layout stru...
Abstract—Image segmentation is an important component of any document image analysis system. While m...
We propose an approach for information extraction for multi-page printed document understanding. The...
This paper presents a table structure understanding algorithm designed using optimization methods. T...
Based on these observations and analysis, we propose a joint discriminative probabilistic framework...
Nowadays PDF documents have become a dominating knowledge repository for both the academia and indus...
Repetition of layout structure is prevalent in document im-ages. In document design, such repetition...
Notre étude se focalise sur l'un des verrous technologiques qui freinent l'industralisation de systè...
Abstract. In this paper, we define the table detection problem as a probability optimization problem...
This paper describes a text-line identification and seg-mentation technique that is probability base...