Abstract – Content of the web page is the textual and graphical information that related to the topic of the page, which is the focus of web data mining and information retrieval. For web pages, the page content is the target of word-segmentation and indexing for search engine, corpus collection of news, reviews, blogs, etc. for knowledge management researches. Extracting content of the web pages correctly and efficiently improves the accuracy of following analysis for it significantly reduces the noise in the pages, and also alleviates the workload of indexing and segmentation. In this works, no uniform approach or model is presented to measure the importance of different nested portions in web pages. Through a user study, we found that pe...
Apart from the main content blocks, almost all web pages on the Internet contain such blocks as navi...
Web pages consist of not only actual content, but also other ele-ments such as branding banners, nav...
With the explosive growth of information sources available on the World Wide Web, it has become incr...
Some previous works show that a web page can be partitioned to multiple segments or blocks, and usua...
As web sites are getting more complicated,the construction of web information extractionsystems beco...
Information Extraction has become an important task for discovering useful knowledge or information ...
Apart from the main content blocks, almost all web pages on the Internet contain such blocks as navi...
As web sites are getting more complicated, the construction of web information extraction systems be...
In contrast to traditional document retrieval, a web page as a whole is not a good information unit ...
<p>Web pages consist of different segments, serving different purposes. Most common types of these s...
In contrast to traditional document retrieval, a web page as a whole is not a good information unit ...
This work aims to provide a page segmentation algorithm which uses both visual and content informati...
International audienceIn this paper, we present a framework for evaluating segmentation algorithms f...
In this work, we describe a new Web page segmentation method to extract the semantic structure from ...
International audienceThis paper presents experiments using an algorithm of web page topic segmentat...
Apart from the main content blocks, almost all web pages on the Internet contain such blocks as navi...
Web pages consist of not only actual content, but also other ele-ments such as branding banners, nav...
With the explosive growth of information sources available on the World Wide Web, it has become incr...
Some previous works show that a web page can be partitioned to multiple segments or blocks, and usua...
As web sites are getting more complicated,the construction of web information extractionsystems beco...
Information Extraction has become an important task for discovering useful knowledge or information ...
Apart from the main content blocks, almost all web pages on the Internet contain such blocks as navi...
As web sites are getting more complicated, the construction of web information extraction systems be...
In contrast to traditional document retrieval, a web page as a whole is not a good information unit ...
<p>Web pages consist of different segments, serving different purposes. Most common types of these s...
In contrast to traditional document retrieval, a web page as a whole is not a good information unit ...
This work aims to provide a page segmentation algorithm which uses both visual and content informati...
International audienceIn this paper, we present a framework for evaluating segmentation algorithms f...
In this work, we describe a new Web page segmentation method to extract the semantic structure from ...
International audienceThis paper presents experiments using an algorithm of web page topic segmentat...
Apart from the main content blocks, almost all web pages on the Internet contain such blocks as navi...
Web pages consist of not only actual content, but also other ele-ments such as branding banners, nav...
With the explosive growth of information sources available on the World Wide Web, it has become incr...