Some previous works show that a web page can be partitioned to multiple segments or blocks, and usually the importance of those blocks in a page is not equivalent. Also, it is proved that differentiating noisy or unimportant blocks from pages can facilitate web mining, search and accessibility. But in these works, no uniform approach or model is presented to measure the importance of different portions in web pages. Through a user study, we found that people do have a consistent view about the importance of blocks in web pages. In this paper, we investigate how to find a model to automatically assign importance values to blocks in a web page. We define the block importance estimation as a learning problem. First, we use the VIPS (VIsion-bas...
Searching useful information from the web, a popular activity, often involves huge irrelevant conten...
In contrast to traditional document retrieval, a web page as a whole is not a good information unit ...
Web blocks such as navigation menus, advertisements, and headers and footers are key components of w...
Some previous works show that a web page can be partitioned to multiple segments or blocks, and usua...
Abstract – Content of the web page is the textual and graphical information that related to the topi...
Apart from the main content blocks, almost all web pages on the Internet contain such blocks as navi...
As web sites are getting more complicated,the construction of web information extractionsystems beco...
Information Extraction has become an important task for discovering useful knowledge or information ...
International audienceIn this paper, we present a framework for evaluating segmentation algorithms f...
As web sites are getting more complicated, the construction of web information extraction systems be...
This paper proposes a new method for computing page importance, referred to as BrowseRank. The conve...
In this paper, we study the problem of learning block classification models to estimate block functi...
A commerceial Web page typically contains many information blocks. Apart from the main content block...
<p>Web pages consist of different segments, serving different purposes. Most common types of these s...
In contrast to traditional document retrieval, a web page as a whole is not a good information unit ...
Searching useful information from the web, a popular activity, often involves huge irrelevant conten...
In contrast to traditional document retrieval, a web page as a whole is not a good information unit ...
Web blocks such as navigation menus, advertisements, and headers and footers are key components of w...
Some previous works show that a web page can be partitioned to multiple segments or blocks, and usua...
Abstract – Content of the web page is the textual and graphical information that related to the topi...
Apart from the main content blocks, almost all web pages on the Internet contain such blocks as navi...
As web sites are getting more complicated,the construction of web information extractionsystems beco...
Information Extraction has become an important task for discovering useful knowledge or information ...
International audienceIn this paper, we present a framework for evaluating segmentation algorithms f...
As web sites are getting more complicated, the construction of web information extraction systems be...
This paper proposes a new method for computing page importance, referred to as BrowseRank. The conve...
In this paper, we study the problem of learning block classification models to estimate block functi...
A commerceial Web page typically contains many information blocks. Apart from the main content block...
<p>Web pages consist of different segments, serving different purposes. Most common types of these s...
In contrast to traditional document retrieval, a web page as a whole is not a good information unit ...
Searching useful information from the web, a popular activity, often involves huge irrelevant conten...
In contrast to traditional document retrieval, a web page as a whole is not a good information unit ...
Web blocks such as navigation menus, advertisements, and headers and footers are key components of w...