In this thesis, which is part and parcel of the more general context of web information retrieval, we consider the issue of thematic and non thematic page indexation, with particular focus on page typology. We suggest a page characterization method in two steps. The first one, named homogeneous corpus extraction, aims at connecting several pages sharing similar features. The second one, called semi-automatic metadata assignment within each homogeneous corpus, is based on propagation : to begin with, only a small proportion of all ressources is manually qualified, ressources information is then propagated to other ressources. Methodologically, the homogeneous corpus extraction is grounded on hypertext link analysis. More precisely, it uses t...
In a hypertext documents are seldom composed of a set of nodes instead of a single one. The informat...
The growth of the Web gives new challenges in Information Retrieval (IR). Most of current systems ar...
The Web today provides the general public with coexisting means of access to an endless variety of i...
Dans cette thèse, qui s'inscrit dans le contexte général de la recherche d'information sur la Toile,...
ISBN 2-906855-18-9Web resources are more and more different, not only regarding thematic content but...
The Web is a huge source of information, and one of the main problems facing users is finding docume...
International audienceGiven the large heterogeneity of the World Wide Web, using metadata on the sea...
The explosive growth of the web has led to surge of research activity in the area of information ret...
Web resources are more and more different, not only regarding thematic content but also related to t...
We describe a real experiment in order to build a thematic index of a scientific book. This book is ...
Web resources are more and more different, not only regarding thematic content but also related to t...
The Web is a huge source of information, and one of the main problems facing users is finding docume...
There are many ways to find information on the Web and search engines are the most frequently used t...
Web resources are more and more different, not only regarding thematic content but also related to t...
In a hypertext documents are seldom composed of a set of nodes instead of a single one. The informat...
The growth of the Web gives new challenges in Information Retrieval (IR). Most of current systems ar...
The Web today provides the general public with coexisting means of access to an endless variety of i...
Dans cette thèse, qui s'inscrit dans le contexte général de la recherche d'information sur la Toile,...
ISBN 2-906855-18-9Web resources are more and more different, not only regarding thematic content but...
The Web is a huge source of information, and one of the main problems facing users is finding docume...
International audienceGiven the large heterogeneity of the World Wide Web, using metadata on the sea...
The explosive growth of the web has led to surge of research activity in the area of information ret...
Web resources are more and more different, not only regarding thematic content but also related to t...
We describe a real experiment in order to build a thematic index of a scientific book. This book is ...
Web resources are more and more different, not only regarding thematic content but also related to t...
The Web is a huge source of information, and one of the main problems facing users is finding docume...
There are many ways to find information on the Web and search engines are the most frequently used t...
Web resources are more and more different, not only regarding thematic content but also related to t...
In a hypertext documents are seldom composed of a set of nodes instead of a single one. The informat...
The growth of the Web gives new challenges in Information Retrieval (IR). Most of current systems ar...
The Web today provides the general public with coexisting means of access to an endless variety of i...