Abstract. Recently, there has been increased interest in the extrac-tion of structured data from the web (both “Surface ” Web and“Hidden” Web). In particular, in this paper we focus on the automatic extraction of Web Lists. Although this task has been studied extensively, existing approaches are based on the assumption that lists are wholly contained in a Web page.They do not consider that many websites span their list-ing on several Web Pages and show for each of these only a partial view. Similar to databases, where a view can represent a subset of the data contained in a table, they split a logical list in multiple views (view lists). Automatic extraction of logical lists is an open problem. To tackle this issue we propose an unsupervise...
this paper we propose a model of a Web site that describes logical structure of contained data. Fur...
In this thesis, we explore current approaches for automatic web data extraction, define their limita...
Abstract. Information extraction from websites is nowadays a relevant problem, usually performed by ...
The discovery and extraction of general lists on the Web continues to be an important problem facing...
The problem of extracting structured data (i.e. lists, record sets, tables, etc.) from the Web has ...
Many Web sites, especially those that dynamically generate HTML pages to display the results of a us...
A large number of web pages contain information about entities in lists where the lists are represen...
The semi-structured information available in HTML and similar documents provide valuable information...
We propose twomethods for constructing automated programs for extraction of information from a class...
Abstract In order to extract entities of a fine-grained category from semi-structured data in web pa...
In order to extract entities of a fine-grained category from semi-structured data in web pages, exis...
UnrestrictedThe World Wide Web has become one of the most important information resources today. Web...
This paper studies structured data extraction from template-generated Web pages. Such pages contain ...
This paper studies structured data extraction from template-generated Web pages. Such pages contain ...
Abstract—The World Wide Web includes several types of website applications. Mainly these application...
this paper we propose a model of a Web site that describes logical structure of contained data. Fur...
In this thesis, we explore current approaches for automatic web data extraction, define their limita...
Abstract. Information extraction from websites is nowadays a relevant problem, usually performed by ...
The discovery and extraction of general lists on the Web continues to be an important problem facing...
The problem of extracting structured data (i.e. lists, record sets, tables, etc.) from the Web has ...
Many Web sites, especially those that dynamically generate HTML pages to display the results of a us...
A large number of web pages contain information about entities in lists where the lists are represen...
The semi-structured information available in HTML and similar documents provide valuable information...
We propose twomethods for constructing automated programs for extraction of information from a class...
Abstract In order to extract entities of a fine-grained category from semi-structured data in web pa...
In order to extract entities of a fine-grained category from semi-structured data in web pages, exis...
UnrestrictedThe World Wide Web has become one of the most important information resources today. Web...
This paper studies structured data extraction from template-generated Web pages. Such pages contain ...
This paper studies structured data extraction from template-generated Web pages. Such pages contain ...
Abstract—The World Wide Web includes several types of website applications. Mainly these application...
this paper we propose a model of a Web site that describes logical structure of contained data. Fur...
In this thesis, we explore current approaches for automatic web data extraction, define their limita...
Abstract. Information extraction from websites is nowadays a relevant problem, usually performed by ...