Web page segmentation into logical blocks is an important preprocessing step for recognizing informative content blocks in a page that leads to efficient information extraction and convenient display on the devices with small-sized screens. Previous methods for Web page segmentation are not flexible in a dynamic Web environment because they largely relied on heuristic rules generated by exploiting structural tags and visual information inherent in a page. To resolve this problem, this paper proposes a new method of Web page segmentation by recognizing repetitive tag patterns called key patterns in the DOM tree structure of a page. We report on the Repetition-based Page Segmentation (REPS) algorithm, which detects key patterns in a page and ...
The aim of this work is to introduce a new vision based web page segmentation method. This method is...
In contrast to traditional document retrieval, a web page as a whole is not a good information unit ...
The increase in availability of hand-held devices capable of browsing the web, such as mobile phones...
The World Wide Web is a vast source of information ac-cessible to computers, but most of its informa...
This paper proposes an enhanced method of Web information extraction by exploiting general phenomena...
In this paper we address the problem of unsupervised Web data extraction. We show that unsupervised ...
As web sites are getting more complicated, the construction of web information extraction systems be...
<p>Web pages consist of different segments, serving different purposes. Most common types of these s...
International audienceIn this paper, we present a framework for evaluating segmentation algorithms f...
International audienceIn this paper we describe Block-o-Matic, a web page segmentation framework. It...
In this work, we describe a new Web page segmentation method to extract the semantic structure from ...
International audienceThis paper presents experiments using an algorithm of web page topic segmentat...
This report deals with segmentation of web pages, which is important discipline of information extra...
<p>Web pages are typically designed for visual interaction. In order to support visual interaction t...
In contrast to traditional document retrieval, a web page as a whole is not a good information unit ...
The aim of this work is to introduce a new vision based web page segmentation method. This method is...
In contrast to traditional document retrieval, a web page as a whole is not a good information unit ...
The increase in availability of hand-held devices capable of browsing the web, such as mobile phones...
The World Wide Web is a vast source of information ac-cessible to computers, but most of its informa...
This paper proposes an enhanced method of Web information extraction by exploiting general phenomena...
In this paper we address the problem of unsupervised Web data extraction. We show that unsupervised ...
As web sites are getting more complicated, the construction of web information extraction systems be...
<p>Web pages consist of different segments, serving different purposes. Most common types of these s...
International audienceIn this paper, we present a framework for evaluating segmentation algorithms f...
International audienceIn this paper we describe Block-o-Matic, a web page segmentation framework. It...
In this work, we describe a new Web page segmentation method to extract the semantic structure from ...
International audienceThis paper presents experiments using an algorithm of web page topic segmentat...
This report deals with segmentation of web pages, which is important discipline of information extra...
<p>Web pages are typically designed for visual interaction. In order to support visual interaction t...
In contrast to traditional document retrieval, a web page as a whole is not a good information unit ...
The aim of this work is to introduce a new vision based web page segmentation method. This method is...
In contrast to traditional document retrieval, a web page as a whole is not a good information unit ...
The increase in availability of hand-held devices capable of browsing the web, such as mobile phones...