A document retroconversion method is described in this paper. Based on a training step using data mining on document samples recognized by OCR and converted in ALTO, we generate a decision tree, thanks to the ”Improved CHAID ” algorithm. The decision tree is then converted into syntactic rules which are used for the structure extraction. The programming is operated thanks to XSLT and Java. The experiments applied on 100 documents belonging to three conference proceedings reached a recall between 0.73 and 1.00 and a precision between 0.84 and 1.00 on 8 logical elements
This paper presents a system for discovering association rules from collections of unstructured docu...
National audienceThis paper proposes a strategy for retrospective conversion of documents. This stra...
This paper describes text mining technique for automatically extracting association rules from colle...
Abstract. This paper shows the supremacy of a func-tional language like XSLT for document retroconve...
International audienceThis paper describes the structural classification method used in a strategy f...
International audienceThis paper proposes a strategy for retrospective conversion of documents. This...
International audienceThis paper proposes a technique for the logical labelling of document images. ...
Text mining refers to the process of deriving high quality information from text. It is also known a...
The present work proposes a method for the automatic extraction of textual elements within documents...
Objective: Develop an automated classifier for the classification of bibliographic material by means...
This thesis presents the application of various classification techniques on text documents. Since t...
Although text mining has been successfully used in the educational sector for quite some time, its a...
International audienceIn the context of the Pangea project at IBM, we needed to design an informatio...
Given a collection of diverging documents about some lost original text, any person interested in th...
Most of the electronic documents available from todays huge number of electronic information sources...
This paper presents a system for discovering association rules from collections of unstructured docu...
National audienceThis paper proposes a strategy for retrospective conversion of documents. This stra...
This paper describes text mining technique for automatically extracting association rules from colle...
Abstract. This paper shows the supremacy of a func-tional language like XSLT for document retroconve...
International audienceThis paper describes the structural classification method used in a strategy f...
International audienceThis paper proposes a strategy for retrospective conversion of documents. This...
International audienceThis paper proposes a technique for the logical labelling of document images. ...
Text mining refers to the process of deriving high quality information from text. It is also known a...
The present work proposes a method for the automatic extraction of textual elements within documents...
Objective: Develop an automated classifier for the classification of bibliographic material by means...
This thesis presents the application of various classification techniques on text documents. Since t...
Although text mining has been successfully used in the educational sector for quite some time, its a...
International audienceIn the context of the Pangea project at IBM, we needed to design an informatio...
Given a collection of diverging documents about some lost original text, any person interested in th...
Most of the electronic documents available from todays huge number of electronic information sources...
This paper presents a system for discovering association rules from collections of unstructured docu...
National audienceThis paper proposes a strategy for retrospective conversion of documents. This stra...
This paper describes text mining technique for automatically extracting association rules from colle...