Relational Web tables have become an important resource for applications such as factual search and entity augmentation. A major challenge for an automatic identification of relevant tables on the Web is the fact that many of these tables have missing or non-informative column labels. Research has focused largely on recovering the meaning of columns by inferring class labels from the instances using external knowledge bases. The table context, which often contains additional information on the table's content, is frequently considered as an indicator for the general content of a table, but not as a source for column-specific details. In this paper, we propose a novel approach to identify and extract column-specific information from the cont...
Tables are a universal idiom to present relational data. Billions of tables on Web pages express ent...
Cross-domain knowledge bases such as YAGO, DBpedia, or the Google Knowledge Graph are being used as ...
We present a method based on header paths for efficient and complete extraction of labeled data from...
Relational Web tables have become an important resource for applications such as factual search and ...
The Web provides a platform for people to share their data, leading to an abundance of accessible in...
The Web contains a wealth of information, and a key challenge is to make this information machine pr...
Purpose – The aim of this paper is to propose a strategy for extracting information from web tables....
Best Paper Award © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE m...
Tabular data is an abundant source of information on the Web, but remains mostly isolated from the l...
The thesis treats automatic extraction of semantic data from Web pages. Within this broad problem, i...
The World-Wide Web consists not only of a huge number of un-structured texts, but also a vast amount...
Abstract. In most applications of paraphrasing, contextual information should be considered since a ...
Previous work on content extraction utilized various heuristics such as link to text ratio, prominen...
In recent years, there has been an increasing interest in extracting and annotating tables on the We...
HTML tables on web pages ("web tables") have been used successfully as a data source for several app...
Tables are a universal idiom to present relational data. Billions of tables on Web pages express ent...
Cross-domain knowledge bases such as YAGO, DBpedia, or the Google Knowledge Graph are being used as ...
We present a method based on header paths for efficient and complete extraction of labeled data from...
Relational Web tables have become an important resource for applications such as factual search and ...
The Web provides a platform for people to share their data, leading to an abundance of accessible in...
The Web contains a wealth of information, and a key challenge is to make this information machine pr...
Purpose – The aim of this paper is to propose a strategy for extracting information from web tables....
Best Paper Award © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE m...
Tabular data is an abundant source of information on the Web, but remains mostly isolated from the l...
The thesis treats automatic extraction of semantic data from Web pages. Within this broad problem, i...
The World-Wide Web consists not only of a huge number of un-structured texts, but also a vast amount...
Abstract. In most applications of paraphrasing, contextual information should be considered since a ...
Previous work on content extraction utilized various heuristics such as link to text ratio, prominen...
In recent years, there has been an increasing interest in extracting and annotating tables on the We...
HTML tables on web pages ("web tables") have been used successfully as a data source for several app...
Tables are a universal idiom to present relational data. Billions of tables on Web pages express ent...
Cross-domain knowledge bases such as YAGO, DBpedia, or the Google Knowledge Graph are being used as ...
We present a method based on header paths for efficient and complete extraction of labeled data from...