Web resources are more and more different, not only regarding thematic content but also related to type of document, geographic origin, level, language, etc. However, web search engines do not take into account this heterogeneity and propose only a thematic access by keywords to the documents. This paper presents a method allowing to extract homogenous corpus of web documents. This method based on link analysis uses co-citation method and focuses more specially on the notion of type of web documents
In this paper we present an approach on structure learning in the area of web documents. This is don...
Gleim R, Mehler A, Dehmer M. Web corpus mining by instance of Wikipedia. In: Proceedings of the 11t...
The role of the Web for text corpus construction is becoming increasingly significant. However, the ...
Web resources are more and more different, not only regarding thematic content but also related to t...
Web resources are more and more different, not only regarding thematic content but also related to t...
Web resources are more and more different, not only regarding thematic content but also related to t...
ISBN 2-906855-18-9Web resources are more and more different, not only regarding thematic content but...
In this thesis, which is part and parcel of the more general context of web information retrieval, w...
International audienceGiven the large heterogeneity of the World Wide Web, using metadata on the sea...
International audienceThe authors, who publish knowledge on the Web related to readable electronic d...
International audienceThe conventional tools of the "web as corpus" framework rely heavily on URLs o...
The Web is a huge source of information, and one of the main problems facing users is finding docume...
We investigate the potential of using the web as a huge corpus for language studies. We test the hyp...
Since its foundation in May 2009, the médialab Sciences Po works to foster the use of digital method...
Dans cette thèse, qui s'inscrit dans le contexte général de la recherche d'information sur la Toile,...
In this paper we present an approach on structure learning in the area of web documents. This is don...
Gleim R, Mehler A, Dehmer M. Web corpus mining by instance of Wikipedia. In: Proceedings of the 11t...
The role of the Web for text corpus construction is becoming increasingly significant. However, the ...
Web resources are more and more different, not only regarding thematic content but also related to t...
Web resources are more and more different, not only regarding thematic content but also related to t...
Web resources are more and more different, not only regarding thematic content but also related to t...
ISBN 2-906855-18-9Web resources are more and more different, not only regarding thematic content but...
In this thesis, which is part and parcel of the more general context of web information retrieval, w...
International audienceGiven the large heterogeneity of the World Wide Web, using metadata on the sea...
International audienceThe authors, who publish knowledge on the Web related to readable electronic d...
International audienceThe conventional tools of the "web as corpus" framework rely heavily on URLs o...
The Web is a huge source of information, and one of the main problems facing users is finding docume...
We investigate the potential of using the web as a huge corpus for language studies. We test the hyp...
Since its foundation in May 2009, the médialab Sciences Po works to foster the use of digital method...
Dans cette thèse, qui s'inscrit dans le contexte général de la recherche d'information sur la Toile,...
In this paper we present an approach on structure learning in the area of web documents. This is don...
Gleim R, Mehler A, Dehmer M. Web corpus mining by instance of Wikipedia. In: Proceedings of the 11t...
The role of the Web for text corpus construction is becoming increasingly significant. However, the ...