The goal of this work is to add the capability to segment documents containing text, graphics, and pictures in the open source OCR engine OCRopus. To achieve this goal, OCRopus\u27 RAST algorithm was improved to recognize non-text regions so that mixed content documents could be analyzed in addition to text-only documents. Also, a method for classifying text and non-text regions was developed and implemented for the Voronoi algorithm enabling users to perform OCR on documents processed by this method. Finally, both algorithms were modified to perform at a range of resolutions. Our testing showed an improvement of 15-40% for the RAST algorithm, giving it an average segmentation accuracy of about 80%. The Voronoi algorithm averaged around 70%...
A method is presented for the efficient segmentation of text lines from scanned images of technical ...
The main objective of this thesis is to develop a system to automatically segment and label a variet...
There is an ever increasing number of publications which do not have the “traditional” layout where ...
The goal of this work is to add the capability to segment documents containing text, graphics, and p...
With the advent of more powerful personal computers, inexpensive memory, and digital cameras, curato...
Automatic transformation of paper documents into electronic forms requires geometrydocument layout a...
Image thresholding and page segmentation are necessary components of any image understanding and rec...
We describe a new approach for evaluating page segmentation algorithms. Unlike techniques that rely ...
Abstract — Digitization of paper-bound documents is one of the foremost commercial interests worldwi...
Column Segmentation logically precedes OCR in the document analysis process. The trainable algorithm...
A single-parameter text-line extraction algorithm is described along with an efficient technique for...
Document page segmentation is one of the most crucial steps in document image analysis. It ideally a...
Page layout analysis has been extensively studied since the 1980`s, particularly after computers beg...
Alternating horizontal and vertical projection profiles are extracted from nested sub-blocks of scan...
There is a significant need to objectively evaluate layout analysis (page segmentation and region cl...
A method is presented for the efficient segmentation of text lines from scanned images of technical ...
The main objective of this thesis is to develop a system to automatically segment and label a variet...
There is an ever increasing number of publications which do not have the “traditional” layout where ...
The goal of this work is to add the capability to segment documents containing text, graphics, and p...
With the advent of more powerful personal computers, inexpensive memory, and digital cameras, curato...
Automatic transformation of paper documents into electronic forms requires geometrydocument layout a...
Image thresholding and page segmentation are necessary components of any image understanding and rec...
We describe a new approach for evaluating page segmentation algorithms. Unlike techniques that rely ...
Abstract — Digitization of paper-bound documents is one of the foremost commercial interests worldwi...
Column Segmentation logically precedes OCR in the document analysis process. The trainable algorithm...
A single-parameter text-line extraction algorithm is described along with an efficient technique for...
Document page segmentation is one of the most crucial steps in document image analysis. It ideally a...
Page layout analysis has been extensively studied since the 1980`s, particularly after computers beg...
Alternating horizontal and vertical projection profiles are extracted from nested sub-blocks of scan...
There is a significant need to objectively evaluate layout analysis (page segmentation and region cl...
A method is presented for the efficient segmentation of text lines from scanned images of technical ...
The main objective of this thesis is to develop a system to automatically segment and label a variet...
There is an ever increasing number of publications which do not have the “traditional” layout where ...