Summary GitTables 1M (https://gittables.github.io) is a corpus of currently 1M relational tables extracted from CSV files in GitHub repositories, that are associated with a license that allows distribution. We aim to grow this to at least 10M tables. Each parquet file in this corpus represents a table with the original content (e.g. values and header) as extracted from the corresponding CSV file. Table columns are enriched with annotations corresponding to >2K semantic types from Schema.org and DBpedia (provided as metadata of the parquet file). These column annotations consist of, for example, semantic types, hierarchical relations to other types, and descriptions. We believe GitTables can facilitate many use-cases, among which: Da...
Abstract. This Big Data Track submission demonstrates how the BTC 2014 dataset, Microdata annotation...
Tough Tables (2T) is a dataset designed to evaluate table annotation approaches on the CEA task. The...
WikiDBs (https://wikidbs.github.io/) is a corpus of relational databases built from Wikidata (https:...
Summary GitTables (https://gittables.github.io) is a corpus of currently 1.7M relational tables ext...
Note: version 0.0.5 includes the files of version 0.0.4, yielding duplication in topic subsets. This...
This dataset contains >800K CSV files behind the GitTables 1M corpus. For more information about th...
Note: the download page of the entire GitTables corpus is here: https://zenodo.org/record/4943312. ...
Note: the entire GitTables corpus is here. Visit https://gittables.github.io for more background and...
This is an old version. The correct GitTables 1.7M corpus can be found here: https://zenodo.org/rec...
Data sets used for experimental evaluation in the related publication: Matching Web Tables with Kno...
This dataset contains the SQL tables of the training and test datasets used in our experimentation. ...
Collecting and refining research data or writing software is a part of many researchers' daily routi...
This dataset contains the SQL tables of the training and test datasets used in our experimentation. ...
Understanding the semantics of table elements is a prerequisite for many data integration and data d...
Mit der wachsenden Popularität von GitHub, dem größten Online-Anbieter von Programm-Quellcode und de...
Abstract. This Big Data Track submission demonstrates how the BTC 2014 dataset, Microdata annotation...
Tough Tables (2T) is a dataset designed to evaluate table annotation approaches on the CEA task. The...
WikiDBs (https://wikidbs.github.io/) is a corpus of relational databases built from Wikidata (https:...
Summary GitTables (https://gittables.github.io) is a corpus of currently 1.7M relational tables ext...
Note: version 0.0.5 includes the files of version 0.0.4, yielding duplication in topic subsets. This...
This dataset contains >800K CSV files behind the GitTables 1M corpus. For more information about th...
Note: the download page of the entire GitTables corpus is here: https://zenodo.org/record/4943312. ...
Note: the entire GitTables corpus is here. Visit https://gittables.github.io for more background and...
This is an old version. The correct GitTables 1.7M corpus can be found here: https://zenodo.org/rec...
Data sets used for experimental evaluation in the related publication: Matching Web Tables with Kno...
This dataset contains the SQL tables of the training and test datasets used in our experimentation. ...
Collecting and refining research data or writing software is a part of many researchers' daily routi...
This dataset contains the SQL tables of the training and test datasets used in our experimentation. ...
Understanding the semantics of table elements is a prerequisite for many data integration and data d...
Mit der wachsenden Popularität von GitHub, dem größten Online-Anbieter von Programm-Quellcode und de...
Abstract. This Big Data Track submission demonstrates how the BTC 2014 dataset, Microdata annotation...
Tough Tables (2T) is a dataset designed to evaluate table annotation approaches on the CEA task. The...
WikiDBs (https://wikidbs.github.io/) is a corpus of relational databases built from Wikidata (https:...