The number of XML documents on the Web has seen a phenomenal increase in the past few years. However, most existing efforts for XML document classification employ standard information retrieval methods based on the document contents alone or the document structure alone. In this paper, we present a system that classifies XML documents based on both their content and structure. In particular, we classify XML documents by a weighted combination of field-wise content similarities and show that this approach outperforms classification that ignores the structure. We also show that weighting the fields differentially outperforms an approach in which each field contributes equally to the classification process. We then present an algorithm that le...
Sources of XML documents are proliferating on the Web and documents are more and more frequently exc...
XML document clustering is essential for many document handling applications such as information sto...
With the increasing use of XML in many domains, XML document clustering has been a central research ...
This paper proposes a clustering approach that explores both the content and the structure of XML do...
[Departement_IRSTEA]Ecotechnologies [TR1_IRSTEA]MOTIVENational audienceThis paper describes a repres...
XML is the new standard for information exchange and retrieval. As XML material becomes more abundan...
XML is touted as the breakthrough in data exchange on the web. As XML material becomes more abundant...
XML has become the universal data format for a wide variety of information systems. The large number...
Abstract Similarity between XML documents can be studied in different ways, in particular depending ...
Extensible Markup Language (XML) is a simple and flexible text format derived from Standard Generali...
This paper presents the incremental clustering algorithm, XML documents Clustering with Level Simila...
Abstract. Every day more digital data in semi-structured format are available on the World Wide Web,...
The large amount and heterogeneity of XML documents on the Web require the development of clustering...
XML has gained popularity for information representation, exchange and retrieval. As XML material be...
Abstract—With the explosion of XML-based online docu-ments, the task of knowledge discovery from the...
Sources of XML documents are proliferating on the Web and documents are more and more frequently exc...
XML document clustering is essential for many document handling applications such as information sto...
With the increasing use of XML in many domains, XML document clustering has been a central research ...
This paper proposes a clustering approach that explores both the content and the structure of XML do...
[Departement_IRSTEA]Ecotechnologies [TR1_IRSTEA]MOTIVENational audienceThis paper describes a repres...
XML is the new standard for information exchange and retrieval. As XML material becomes more abundan...
XML is touted as the breakthrough in data exchange on the web. As XML material becomes more abundant...
XML has become the universal data format for a wide variety of information systems. The large number...
Abstract Similarity between XML documents can be studied in different ways, in particular depending ...
Extensible Markup Language (XML) is a simple and flexible text format derived from Standard Generali...
This paper presents the incremental clustering algorithm, XML documents Clustering with Level Simila...
Abstract. Every day more digital data in semi-structured format are available on the World Wide Web,...
The large amount and heterogeneity of XML documents on the Web require the development of clustering...
XML has gained popularity for information representation, exchange and retrieval. As XML material be...
Abstract—With the explosion of XML-based online docu-ments, the task of knowledge discovery from the...
Sources of XML documents are proliferating on the Web and documents are more and more frequently exc...
XML document clustering is essential for many document handling applications such as information sto...
With the increasing use of XML in many domains, XML document clustering has been a central research ...