Eine strukturierte Repräsentation von dem Inhalt von Dokumenten ist die Basis für viele Systeme, ob es sich nun um eine automatisierte Verarbeitung von Dokumenten mithilfe von intelligenten Workflows oder generell um Datenabfragen im World Wide Web handelt. Jedoch ist eine Vielzahl an Dokumenten in unstrukturierten Formaten abgespeichert. Ein weitverbreitetes Beispiel eines solchen Formates ist das Portable Document Format (PDF). Daher liegt ein großes, unerschlossenes Potential in der Ableitung von strukturierter Information aus unstrukturierten Dateiformaten. Große Anteile von wichtiger Information sind hierbei in Tabellen hinterlegt, da diese eine sehr dichte Repräsentation von Inhalten darstellen. Durch eine nicht greifbare Anzahl an ve...
This paper presents a methodology for the evaluation of table understanding algorithms for PDF docum...
Large amounts of communication, documentation as well as knowledge and information are stored in tex...
The article discusses the algorithms for detecting and extracting lines, paragraphs with their prope...
Tables are an intuitive and universally used way of presenting large sets of experimental results an...
This thesis deals with the restructuring of unstructured PDF documents containing graphical elements...
This thesis deals with the restructuring of unstructured PDF documents containing graphical elements...
Nowadays PDF documents have become a dominating knowledge repository for both the academia and indus...
Tables in documents are a widely-available and rich source of information, but not yet well-utilised...
Cette thèse traite de la restructuration des documents déstructurés de type PDF contenant des élémen...
This article presents Xed, a reverse engineering tool for PDF documents, which extracts the original...
The rapid increase of published research papers in recent years has escalated the need for automated...
We present a machine-learning-guided process that can efficiently extract factor tables from unstruc...
In the era of digitization, the vast volume of scientific publications has become readily accessible...
This is the author accepted manuscript. The final version is available from IEEE via the DOI in this...
This thesis explores the domain of document analysis and document classification within the PDF docu...
This paper presents a methodology for the evaluation of table understanding algorithms for PDF docum...
Large amounts of communication, documentation as well as knowledge and information are stored in tex...
The article discusses the algorithms for detecting and extracting lines, paragraphs with their prope...
Tables are an intuitive and universally used way of presenting large sets of experimental results an...
This thesis deals with the restructuring of unstructured PDF documents containing graphical elements...
This thesis deals with the restructuring of unstructured PDF documents containing graphical elements...
Nowadays PDF documents have become a dominating knowledge repository for both the academia and indus...
Tables in documents are a widely-available and rich source of information, but not yet well-utilised...
Cette thèse traite de la restructuration des documents déstructurés de type PDF contenant des élémen...
This article presents Xed, a reverse engineering tool for PDF documents, which extracts the original...
The rapid increase of published research papers in recent years has escalated the need for automated...
We present a machine-learning-guided process that can efficiently extract factor tables from unstruc...
In the era of digitization, the vast volume of scientific publications has become readily accessible...
This is the author accepted manuscript. The final version is available from IEEE via the DOI in this...
This thesis explores the domain of document analysis and document classification within the PDF docu...
This paper presents a methodology for the evaluation of table understanding algorithms for PDF docum...
Large amounts of communication, documentation as well as knowledge and information are stored in tex...
The article discusses the algorithms for detecting and extracting lines, paragraphs with their prope...