This thesis deals with the restructuring of unstructured PDF documents containing graphical elements such as schematics, plans and drawings, with the aim of restructuring them. Using the KDD (Knowledge Discovery in Database) method for data restructuring, we introduce the (A)KDD (Antropocentric Knowledge Discovery in Database) method that we developed which is derived from the KDD method by adding an incremental aspect and an user-centered approach. We present, in particular, a technique based on on the bucket sort algorithm pattern in order to extract with efficiency graphic symbols contained in a PDF file. It is compared to the results obtained by Puglisi on strings. Then, we formulate the hypothesis : ” taking into account the chronologi...
A complete system able to find symbols in graphical document without a priori knowledge is proposed ...
This thesis deals with the design of digital XML publishing chains : document production software wh...
Background. In recent years, libraries and archives led important digitisation campaigns that opened...
This thesis deals with the restructuring of unstructured PDF documents containing graphical elements...
Cette thèse traite de la restructuration des documents déstructurés de type PDF contenant des élémen...
Eine strukturierte Repräsentation von dem Inhalt von Dokumenten ist die Basis für viele Systeme, ob ...
National audienceThis paper proposes a strategy for retrospective conversion of documents. This stra...
This article presents Xed, a reverse engineering tool for PDF documents, which extracts the original...
The automatic processing of written documents is a very active field in the industry. Indeed, due to...
Change detection in structured documents (e.g. SGML is important in data warehousing, digital librar...
The current spread of digital documents raised the need of effective content-based retrieval techni...
This thesis explores the domain of document analysis and document classification within the PDF docu...
Ce travail de thèse se situe à la croisée de trois thématiques de recherche : la mise en place de re...
This thesis tackles the problem of technical document interpretationapplied to ancient and colored c...
Cette thèse s'attache à l'étude de la structuration des documents dits à "typographie riche et récur...
A complete system able to find symbols in graphical document without a priori knowledge is proposed ...
This thesis deals with the design of digital XML publishing chains : document production software wh...
Background. In recent years, libraries and archives led important digitisation campaigns that opened...
This thesis deals with the restructuring of unstructured PDF documents containing graphical elements...
Cette thèse traite de la restructuration des documents déstructurés de type PDF contenant des élémen...
Eine strukturierte Repräsentation von dem Inhalt von Dokumenten ist die Basis für viele Systeme, ob ...
National audienceThis paper proposes a strategy for retrospective conversion of documents. This stra...
This article presents Xed, a reverse engineering tool for PDF documents, which extracts the original...
The automatic processing of written documents is a very active field in the industry. Indeed, due to...
Change detection in structured documents (e.g. SGML is important in data warehousing, digital librar...
The current spread of digital documents raised the need of effective content-based retrieval techni...
This thesis explores the domain of document analysis and document classification within the PDF docu...
Ce travail de thèse se situe à la croisée de trois thématiques de recherche : la mise en place de re...
This thesis tackles the problem of technical document interpretationapplied to ancient and colored c...
Cette thèse s'attache à l'étude de la structuration des documents dits à "typographie riche et récur...
A complete system able to find symbols in graphical document without a priori knowledge is proposed ...
This thesis deals with the design of digital XML publishing chains : document production software wh...
Background. In recent years, libraries and archives led important digitisation campaigns that opened...