Abstract. Accessing the structured content of PDF document is a difficult task, requiring pre-processing and reverse engineering techniques. In this paper, we first present different methods to accomplish this task, which are based either on document image analysis, or on electronic content extraction. Then, XCDF, a canonical format with well-defined properties is proposed as a suitable solution for representing structured electronic documents and as an entry point for further researches and works. The system and methods used for reverse engineering PDF document into this canonical format are also presented. We finally present current applications of this work into various domains, spacing from data mining to multimedia navigation, and cons...
Information can include text, pictures and signatures that can be scanned into a document format, su...
The paper PDF Document Format Features for Document Management and Distribution describes the core o...
Tables are an intuitive and universally used way of presenting large sets of experimental results an...
This article presents Xed, a reverse engineering tool for PDF documents, which extracts the original...
Physical and logical structure recovering from electronic documents is still an open issue. In this ...
PDF became a very common format for exchanging printable documents. Further, it can be easily genera...
A strategy for document analysis is presented which uses Portable Document Format (PDF the underlyin...
Nowadays PDF documents have become a dominating knowledge repository for both the academia and indus...
summary:We present a progress report on our ongoing project of reverse engineering scientific PDF do...
Information can include text, pictures and signatures that can be scanned into a document format, su...
This paper describes a tool for recombining the logical structure from an XML document with the type...
Documents are often marked up in XML-based tagsets to delineate major structural components such as ...
The PDF format plays a crucial role in the field of electronic academic literature publishing, but d...
The purpose of this article is to demonstrate a practical use case of PDF/A file format for digitiza...
The Portable Document Format (PDF), defined by Adobe Systems Inc. as the basis of its Acrobat produc...
Information can include text, pictures and signatures that can be scanned into a document format, su...
The paper PDF Document Format Features for Document Management and Distribution describes the core o...
Tables are an intuitive and universally used way of presenting large sets of experimental results an...
This article presents Xed, a reverse engineering tool for PDF documents, which extracts the original...
Physical and logical structure recovering from electronic documents is still an open issue. In this ...
PDF became a very common format for exchanging printable documents. Further, it can be easily genera...
A strategy for document analysis is presented which uses Portable Document Format (PDF the underlyin...
Nowadays PDF documents have become a dominating knowledge repository for both the academia and indus...
summary:We present a progress report on our ongoing project of reverse engineering scientific PDF do...
Information can include text, pictures and signatures that can be scanned into a document format, su...
This paper describes a tool for recombining the logical structure from an XML document with the type...
Documents are often marked up in XML-based tagsets to delineate major structural components such as ...
The PDF format plays a crucial role in the field of electronic academic literature publishing, but d...
The purpose of this article is to demonstrate a practical use case of PDF/A file format for digitiza...
The Portable Document Format (PDF), defined by Adobe Systems Inc. as the basis of its Acrobat produc...
Information can include text, pictures and signatures that can be scanned into a document format, su...
The paper PDF Document Format Features for Document Management and Distribution describes the core o...
Tables are an intuitive and universally used way of presenting large sets of experimental results an...