Web pages are typical semi-structure data. Some tree-based models have been proposed to describe the semantic content structure of web pages in order to facilitate further content analysis. However, most existing models only present the segmentation hierarchy of content blocks rather than the semantic relationships among them. In this work, we propose at novel web page semantic structure model, called Logical Structure Model. It can present more comprehensive structure information of web pages. Based on this model, the hidden patterns in web content can be revealed easier. The proposed model has been used to facilitate identifying course metadata in our Online Course Organization project, which aims to build an online course portal to serve...
Our research explores the possibility of categorizing webpages and webpage genre by structure or lay...
This work aims to provide a page segmentation algorithm which uses both visual and content informati...
Thesis (Ph.D.)--University of Washington, 2021The World Wide Web contains countless semi-structured ...
UnrestrictedThe World Wide Web has become one of the most important information resources today. Web...
An important aspect of research for Web information extraction relates to the inference of complex r...
In data-intensive web sites pages are generated by scripts that embed data from a backend database i...
Content-related metadata plays an important role in the effort of developing intelligent web applica...
this paper we propose a model of a Web site that describes logical structure of contained data. Fur...
In this work, we describe a new Web page segmentation method to extract the semantic structure from ...
Abstract. The paper proposes a data structure modelling method, which aim is to estimate a structure...
In systems that provide integrated management of both structured, database--style, data and semistru...
In data-intensive web sites pages are generated by scripts that embed data from a back-end database...
To make real Web information more machine processable, this paper presents a new approach to intra-p...
Abstract—The World Wide Web includes several types of website applications. Mainly these application...
Content-related metadata plays an important role in the effort of developing intelligent web applica...
Our research explores the possibility of categorizing webpages and webpage genre by structure or lay...
This work aims to provide a page segmentation algorithm which uses both visual and content informati...
Thesis (Ph.D.)--University of Washington, 2021The World Wide Web contains countless semi-structured ...
UnrestrictedThe World Wide Web has become one of the most important information resources today. Web...
An important aspect of research for Web information extraction relates to the inference of complex r...
In data-intensive web sites pages are generated by scripts that embed data from a backend database i...
Content-related metadata plays an important role in the effort of developing intelligent web applica...
this paper we propose a model of a Web site that describes logical structure of contained data. Fur...
In this work, we describe a new Web page segmentation method to extract the semantic structure from ...
Abstract. The paper proposes a data structure modelling method, which aim is to estimate a structure...
In systems that provide integrated management of both structured, database--style, data and semistru...
In data-intensive web sites pages are generated by scripts that embed data from a back-end database...
To make real Web information more machine processable, this paper presents a new approach to intra-p...
Abstract—The World Wide Web includes several types of website applications. Mainly these application...
Content-related metadata plays an important role in the effort of developing intelligent web applica...
Our research explores the possibility of categorizing webpages and webpage genre by structure or lay...
This work aims to provide a page segmentation algorithm which uses both visual and content informati...
Thesis (Ph.D.)--University of Washington, 2021The World Wide Web contains countless semi-structured ...