Querying heterogeneous collections of data-centric XML documents requires a combination of database languages and concepts used in information retrieval, in particular similarity search and ranking. In this paper we present an approach to find approximate answers to formal user queries. We reduce the problem of answering queries against XML document collections to the well-known unordered tree inclusion problem. We extend this problem to an optimization problem by applying a cost model to the embeddings. Thereby we are able to determine how close parts of the XML document match a user query. We present an efficient algorithm that finds all approximate matches and ranks them according to their similarity to the query.Peer ReviewedACM SIGIR 2...
Order and Return The most relevant results may be the most common form of XML query processing. To w...
As XML is becoming the de facto standard for data representation and communication, especially on th...
XML is undoubtedly becoming the predominant de facto standard for data representation and communicat...
Querying heterogeneous collections of data-centric XML documents requires a combination of database ...
none1noThe semantic and structural heterogeneity of large XML digital libraries emphasizes the need ...
Due to the heterogeneous nature of XML data for internet applications exact matching of queries is o...
The progressive adoption of XML by new communities of users has motivated the appearance of applicat...
Extracting information from semistructured documents is a very hard task, and is going to become mor...
International audienceInformation retrieval encounters a migration from the traditional paradigm (re...
The rapid adoption of XML as the standard for data representation and ex-change foreshadows a massiv...
In this paper, we deal with the problem of effective search and query answering in heterogeneous web...
XML information items collected from heterogeneous sources often carry similar semantics but turn ou...
In the last few years several repositories for storing XML documents and languages for querying XML...
Abstract- Extracting information from semi structured documents is a very hard task, and is going to...
International audienceWith the increasing number of available XML documents, numerous approaches for...
Order and Return The most relevant results may be the most common form of XML query processing. To w...
As XML is becoming the de facto standard for data representation and communication, especially on th...
XML is undoubtedly becoming the predominant de facto standard for data representation and communicat...
Querying heterogeneous collections of data-centric XML documents requires a combination of database ...
none1noThe semantic and structural heterogeneity of large XML digital libraries emphasizes the need ...
Due to the heterogeneous nature of XML data for internet applications exact matching of queries is o...
The progressive adoption of XML by new communities of users has motivated the appearance of applicat...
Extracting information from semistructured documents is a very hard task, and is going to become mor...
International audienceInformation retrieval encounters a migration from the traditional paradigm (re...
The rapid adoption of XML as the standard for data representation and ex-change foreshadows a massiv...
In this paper, we deal with the problem of effective search and query answering in heterogeneous web...
XML information items collected from heterogeneous sources often carry similar semantics but turn ou...
In the last few years several repositories for storing XML documents and languages for querying XML...
Abstract- Extracting information from semi structured documents is a very hard task, and is going to...
International audienceWith the increasing number of available XML documents, numerous approaches for...
Order and Return The most relevant results may be the most common form of XML query processing. To w...
As XML is becoming the de facto standard for data representation and communication, especially on th...
XML is undoubtedly becoming the predominant de facto standard for data representation and communicat...