In contrast to traditional document retrieval, a web page as a whole is not a good information unit to search because it often contains multiple topics and a lot of irrelevant information from navigation, decoration, and interaction part of the page. In this paper, we propose a VIsion-based Page Segmentation (VIPS) algorithm to detect the semantic content structure in a web page. Compared with simple DOM based segmentation method, our page segmentation scheme utilizes useful visual cues to obtain a better partition of a page at the semantic level. By using our VIPS algorithm to assist the selection of query expansion terms in pseudo-relevance feedback in web information retrieval, we achieve 27 % performance improvement on Web Track dataset
As web sites are getting more complicated, the construction of web information extraction systems be...
<p>Web pages are typically designed for visual interaction. In order to support visual interaction t...
In the present work we suggest and test new process of web information extraction. Proposed method c...
In contrast to traditional document retrieval, a web page as a whole is not a good information unit ...
This work aims to provide a page segmentation algorithm which uses both visual and content informati...
<p>Web pages consist of different segments, serving different purposes. Most common types of these s...
International audienceThis paper presents experiments using an algorithm of web page topic segmentat...
Relations of algorithms for hidden web-focused information retrieval develop with it. When the stage...
As web sites are getting more complicated,the construction of web information extractionsystems beco...
The aim of this work is to introduce a new vision based web page segmentation method. This method is...
Apart from the main content blocks, almost all web pages on the Internet contain such blocks as navi...
The Web is increasingly becoming a verylarge information source. However, theinformation is visually...
Segmentation of WWW pages or page division on di erent semantics blocks is one of the disciplines of...
Abstract – Content of the web page is the textual and graphical information that related to the topi...
International audienceIn this paper, we present a framework for evaluating segmentation algorithms f...
As web sites are getting more complicated, the construction of web information extraction systems be...
<p>Web pages are typically designed for visual interaction. In order to support visual interaction t...
In the present work we suggest and test new process of web information extraction. Proposed method c...
In contrast to traditional document retrieval, a web page as a whole is not a good information unit ...
This work aims to provide a page segmentation algorithm which uses both visual and content informati...
<p>Web pages consist of different segments, serving different purposes. Most common types of these s...
International audienceThis paper presents experiments using an algorithm of web page topic segmentat...
Relations of algorithms for hidden web-focused information retrieval develop with it. When the stage...
As web sites are getting more complicated,the construction of web information extractionsystems beco...
The aim of this work is to introduce a new vision based web page segmentation method. This method is...
Apart from the main content blocks, almost all web pages on the Internet contain such blocks as navi...
The Web is increasingly becoming a verylarge information source. However, theinformation is visually...
Segmentation of WWW pages or page division on di erent semantics blocks is one of the disciplines of...
Abstract – Content of the web page is the textual and graphical information that related to the topi...
International audienceIn this paper, we present a framework for evaluating segmentation algorithms f...
As web sites are getting more complicated, the construction of web information extraction systems be...
<p>Web pages are typically designed for visual interaction. In order to support visual interaction t...
In the present work we suggest and test new process of web information extraction. Proposed method c...