A new web content structure based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and automatic page adaptation can benefit from this structure. This paper presents an automatic top-down, tag-tree independent approach to detect web content structure. It simulates how a user understands web layout structure based on his visual perception. Comparing to other existing techniques, our approach is independent to underlying documentation representation such as HTML and works well even when the HTML structure is far different from layout structure. Experiments show satisfactory results.Computer Science, Information SystemsComputer Science, Software EngineeringComput...
In this paper we present a simple, robust, accurate and language-independent solution for extracting...
Extracting and processing information from Web pages is an important task in many areas like constru...
ABSTRAKSI: Sebuah halaman web biasanya mengandung berbagai jenis content seperti navigasi, dekorasi,...
Abstract: The World Wide Web is a distributed, heterogeneous and semi-structured information space. ...
AbstractDespite the exponential WWW growth and the success of the Semantic Web, there is limited sup...
This thesis describes the design and implementation of an algorithm that, using some initial hints f...
This work aims to provide a page segmentation algorithm which uses both visual and content informati...
International audienceThis paper presents experiments using an algorithm of web page topic segmentat...
There is a large amount of data available on the Web. Data are often represented as text, enriched w...
AbstractAbility to create web page is one of basic IT skills. In the web page creation learning proc...
Recent work has shown the effectiveness of leveraging layout and tag-tree structure for segmenting w...
We present general-purpose methods for recognizing certain types of structure in HTML documents. The...
this paper we propose a model of a Web site that describes logical structure of contained data. Fur...
Our research explores the possibility of categorizing webpages and webpage genre by structure or lay...
Abstract. Extracting and processing information from web pages is an important task in many areas li...
In this paper we present a simple, robust, accurate and language-independent solution for extracting...
Extracting and processing information from Web pages is an important task in many areas like constru...
ABSTRAKSI: Sebuah halaman web biasanya mengandung berbagai jenis content seperti navigasi, dekorasi,...
Abstract: The World Wide Web is a distributed, heterogeneous and semi-structured information space. ...
AbstractDespite the exponential WWW growth and the success of the Semantic Web, there is limited sup...
This thesis describes the design and implementation of an algorithm that, using some initial hints f...
This work aims to provide a page segmentation algorithm which uses both visual and content informati...
International audienceThis paper presents experiments using an algorithm of web page topic segmentat...
There is a large amount of data available on the Web. Data are often represented as text, enriched w...
AbstractAbility to create web page is one of basic IT skills. In the web page creation learning proc...
Recent work has shown the effectiveness of leveraging layout and tag-tree structure for segmenting w...
We present general-purpose methods for recognizing certain types of structure in HTML documents. The...
this paper we propose a model of a Web site that describes logical structure of contained data. Fur...
Our research explores the possibility of categorizing webpages and webpage genre by structure or lay...
Abstract. Extracting and processing information from web pages is an important task in many areas li...
In this paper we present a simple, robust, accurate and language-independent solution for extracting...
Extracting and processing information from Web pages is an important task in many areas like constru...
ABSTRAKSI: Sebuah halaman web biasanya mengandung berbagai jenis content seperti navigasi, dekorasi,...