Extensible Markup Language (XML) is a simple and flexible text format derived from Standard Generalized Markup Language (SGML) [1]. It has been widely accepted as a crucial component of many information retrieval related applications, such as XML databases, web services, etc. One of the reasons for its wide acceptance is its customized format during data transmission or storage. Classification is an important data mining task that aims to assign unknown objects to classes that best characterize them. In this thesis, we propose a method to classify XML documents under the assumption that they do not have a common schema that may or may not be available, which is closer to the real cases. Our method is similarity-based. Its main characteristi...
This paper proposes a clustering approach that explores both the content and the structure of XML do...
In the last few years we have observed a proliferation of approaches for clustering XML docu- ments ...
Extracting information from semistructured documents is a very hard task, and is going to become mor...
Extensible Markup Language (XML) is a simple and flexible text format derived from Standard Generali...
The categorization of documents is traditionally topic-based. This paper presents a complementary an...
XML has gained popularity for information representation, exchange and retrieval. As XML material be...
XML is the new standard for information exchange and retrieval. As XML material becomes more abundan...
The number of XML documents on the Web has seen a phenomenal increase in the past few years. However...
XML is touted as the breakthrough in data exchange on the web. As XML material becomes more abundant...
XML has become the universal data format for a wide variety of information systems. The large number...
[Departement_IRSTEA]Ecotechnologies [TR1_IRSTEA]MOTIVENational audienceThis paper describes a repres...
Summarization: XML is rapidly emerging as the new standard for data representation and exchange on t...
{XML} will be the method of choice for representing all kinds of documents in product catalogs, dig...
The evolution of computing technology suggests that it has become more feasible to offer access to W...
In the last few years we have observed a proliferation of approaches for clustering XML documents an...
This paper proposes a clustering approach that explores both the content and the structure of XML do...
In the last few years we have observed a proliferation of approaches for clustering XML docu- ments ...
Extracting information from semistructured documents is a very hard task, and is going to become mor...
Extensible Markup Language (XML) is a simple and flexible text format derived from Standard Generali...
The categorization of documents is traditionally topic-based. This paper presents a complementary an...
XML has gained popularity for information representation, exchange and retrieval. As XML material be...
XML is the new standard for information exchange and retrieval. As XML material becomes more abundan...
The number of XML documents on the Web has seen a phenomenal increase in the past few years. However...
XML is touted as the breakthrough in data exchange on the web. As XML material becomes more abundant...
XML has become the universal data format for a wide variety of information systems. The large number...
[Departement_IRSTEA]Ecotechnologies [TR1_IRSTEA]MOTIVENational audienceThis paper describes a repres...
Summarization: XML is rapidly emerging as the new standard for data representation and exchange on t...
{XML} will be the method of choice for representing all kinds of documents in product catalogs, dig...
The evolution of computing technology suggests that it has become more feasible to offer access to W...
In the last few years we have observed a proliferation of approaches for clustering XML documents an...
This paper proposes a clustering approach that explores both the content and the structure of XML do...
In the last few years we have observed a proliferation of approaches for clustering XML docu- ments ...
Extracting information from semistructured documents is a very hard task, and is going to become mor...