Abstract In order to extract entities of a fine-grained category from semi-structured data in web pages, existing information extraction systems rely on seed examples or redundancy across multiple web pages. In this paper, we consider a new zero-shot learning task of extracting entities specified by a natural language query (in place of seeds) given only a single web page. Our approach defines a log-linear model over latent extraction predicates, which select lists of entities from the web page. The main challenge is to define features on widely varying candidate entity lists. We tackle this by abstracting list elements and using aggregate statistics to define features. Finally, we created a new dataset of diverse queries and web pages, and...
In the last two decades, a huge amount of data are increasingly become avail-able due to the exponen...
We propose twomethods for constructing automated programs for extraction of information from a class...
Acquiring vast bodies of knowledge in machine-understandable form is one of the main challenges in a...
In order to extract entities of a fine-grained category from semi-structured data in web pages, exis...
A large number of web pages contain information about entities in lists where the lists are represen...
This paper describes a system for entity extraction from the web. The system uses three different ex...
The thesis treats automatic extraction of semantic data from Web pages. Within this broad problem, i...
We introduce landmark grammars, a new family of context-free grammars aimed at describing the HTML s...
Abstract. This paper studies structured data extraction from Web pages, e.g., online product descrip...
Abstract This paper describes a system for entity extraction from the web. The sys-tem uses three di...
Web extraction is the task of turning unstructured HTML into structured data. Previous approaches re...
Thesis (Ph.D.)--University of Washington, 2015-12With the advent of the Web, textual information has...
This paper discusses the problem of information extraction fromsuch web pages. Internet, especially ...
Popular entities often have thousands of instances on the Web. In this paper, we focus on the case w...
Web search engines can greatly benefit from knowledge about attributes of entities present in search...
In the last two decades, a huge amount of data are increasingly become avail-able due to the exponen...
We propose twomethods for constructing automated programs for extraction of information from a class...
Acquiring vast bodies of knowledge in machine-understandable form is one of the main challenges in a...
In order to extract entities of a fine-grained category from semi-structured data in web pages, exis...
A large number of web pages contain information about entities in lists where the lists are represen...
This paper describes a system for entity extraction from the web. The system uses three different ex...
The thesis treats automatic extraction of semantic data from Web pages. Within this broad problem, i...
We introduce landmark grammars, a new family of context-free grammars aimed at describing the HTML s...
Abstract. This paper studies structured data extraction from Web pages, e.g., online product descrip...
Abstract This paper describes a system for entity extraction from the web. The sys-tem uses three di...
Web extraction is the task of turning unstructured HTML into structured data. Previous approaches re...
Thesis (Ph.D.)--University of Washington, 2015-12With the advent of the Web, textual information has...
This paper discusses the problem of information extraction fromsuch web pages. Internet, especially ...
Popular entities often have thousands of instances on the Web. In this paper, we focus on the case w...
Web search engines can greatly benefit from knowledge about attributes of entities present in search...
In the last two decades, a huge amount of data are increasingly become avail-able due to the exponen...
We propose twomethods for constructing automated programs for extraction of information from a class...
Acquiring vast bodies of knowledge in machine-understandable form is one of the main challenges in a...