Tables are a common means to display data in human-friendly formats. Many authors have worked on proposals to extract those data back since this has many interesting applications. In this article, we summarise and compare many of the proposals to extract data from tables that are encoded using HTML and have been published between 2000 and 2018. We first present a vocabulary that homogenises the terminology used in this field; next, we use it to summarise the proposals; finally, we compare them side by side. Our analysis highlights several challenges to which no proposal provides a conclusive solution and a few more that have not been addressed sufficiently; simply put, no proposal provides a complete solution to the problem, which ...
We describe a method to extract tabular data from web pages. Rather than just analyzing the DOM tree...
This paper describes a method to extract ontologies from tables in the World Wide Web (WWW). A table...
The World Wide Web has an enormous amount of useful data presented as HTML tables. These tables are ...
Extracting data from user-friendly HTML tables is difficult because of their different lay outs, for...
HTML tables have become pervasive on the Web. Extracting their data automatically is difficult beca...
The Web provides many data that are encoded using HTML tables. This facilitates rendering them, but...
The Web provides a platform for people to share their data, leading to an abundance of accessible in...
The Web contains a wealth of information, and a key challenge is to make this information machine pr...
Table extraction is the task of locating tables in a document and extracting their content along wit...
Automating the conversion of human-readable HTML tables into machine-readable relational tables will...
International audienceThe process of data extraction from internet sources have beenoriginating the ...
We present a method based on header paths for efficient and complete extraction of labeled data from...
This paper plans an end-to-end method for extracting information from tables embedded in documents; ...
HTML tables on web pages ("web tables") have been used successfully as a data source for several app...
The World-Wide Web consists of a huge number of unstruc-tured documents, but it also contains struct...
We describe a method to extract tabular data from web pages. Rather than just analyzing the DOM tree...
This paper describes a method to extract ontologies from tables in the World Wide Web (WWW). A table...
The World Wide Web has an enormous amount of useful data presented as HTML tables. These tables are ...
Extracting data from user-friendly HTML tables is difficult because of their different lay outs, for...
HTML tables have become pervasive on the Web. Extracting their data automatically is difficult beca...
The Web provides many data that are encoded using HTML tables. This facilitates rendering them, but...
The Web provides a platform for people to share their data, leading to an abundance of accessible in...
The Web contains a wealth of information, and a key challenge is to make this information machine pr...
Table extraction is the task of locating tables in a document and extracting their content along wit...
Automating the conversion of human-readable HTML tables into machine-readable relational tables will...
International audienceThe process of data extraction from internet sources have beenoriginating the ...
We present a method based on header paths for efficient and complete extraction of labeled data from...
This paper plans an end-to-end method for extracting information from tables embedded in documents; ...
HTML tables on web pages ("web tables") have been used successfully as a data source for several app...
The World-Wide Web consists of a huge number of unstruc-tured documents, but it also contains struct...
We describe a method to extract tabular data from web pages. Rather than just analyzing the DOM tree...
This paper describes a method to extract ontologies from tables in the World Wide Web (WWW). A table...
The World Wide Web has an enormous amount of useful data presented as HTML tables. These tables are ...