This thesis explores the domain of document analysis and document classification within the PDF document environment The main focus is the creation of a document classification technique which can identify the logical class of a PDF document and so provide necessary information to document class specific algorithms (such as document understanding techniques). The thesis describes a page decomposition technique which is tailored to render the information contained in an unstructured PDF file into a set of blocks. The new technique is based on published research but contains many modifications which enable it to competently analyse the internal document model of PDF documents. A new level of document processing is presented: advanced docume...
Legal documents often have a complex layout with many different headings, headers and footers, side ...
The paper documents recognition is fundamental for office automation becoming every day a more power...
This paper presents results in automated genre classification of digital documents in PDF format. It...
This thesis explores the domain of document analysis and document classification within the PDF docu...
A strategy for document analysis is presented which uses Portable Document Format (PDF the underlyin...
This paper outlines the requirements and components for a proposed Document Analysis System, which a...
This paper deals with automatic classification of text documents, showing advantages of the classifi...
The PDF format plays a crucial role in the field of electronic academic literature publishing, but d...
The availability of large, heterogeneous repositories of electronic documents is increasing rapidly,...
A paper document processing system is an information system component which transforms information o...
In a more digitalized world, companies with e-archive solutions want to be part of the usage of mode...
This thesis deals with a document classification, especially with a text classification method. Main...
We present a document analysis system able to assign logical labels and extract the reading order in...
With the abundance of online research platforms, much information presented in PDF files, such as ar...
The economic feasibility of creating a large database of documentimage has left a tremendous need fo...
Legal documents often have a complex layout with many different headings, headers and footers, side ...
The paper documents recognition is fundamental for office automation becoming every day a more power...
This paper presents results in automated genre classification of digital documents in PDF format. It...
This thesis explores the domain of document analysis and document classification within the PDF docu...
A strategy for document analysis is presented which uses Portable Document Format (PDF the underlyin...
This paper outlines the requirements and components for a proposed Document Analysis System, which a...
This paper deals with automatic classification of text documents, showing advantages of the classifi...
The PDF format plays a crucial role in the field of electronic academic literature publishing, but d...
The availability of large, heterogeneous repositories of electronic documents is increasing rapidly,...
A paper document processing system is an information system component which transforms information o...
In a more digitalized world, companies with e-archive solutions want to be part of the usage of mode...
This thesis deals with a document classification, especially with a text classification method. Main...
We present a document analysis system able to assign logical labels and extract the reading order in...
With the abundance of online research platforms, much information presented in PDF files, such as ar...
The economic feasibility of creating a large database of documentimage has left a tremendous need fo...
Legal documents often have a complex layout with many different headings, headers and footers, side ...
The paper documents recognition is fundamental for office automation becoming every day a more power...
This paper presents results in automated genre classification of digital documents in PDF format. It...