Machine Learning (ML) applications require high-quality datasets. Automated data augmentation techniques can help increase the richness of training data, thus increasing the ML model accuracy. Existing solutions focus on efficiency and ML model accuracy but do not exploit the richness of dataset relationships. With relational data, the challenge lies in identifying join paths that best augment a feature table to increase the performance of a model. In this paper we propose a two-step, automated data augmentation approach for relational data that involves: (i) enumerating join paths of various lengths given a base table and (ii) ranking the join paths using filter methods for feature selection. We show that our approach can improve predictio...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
We present a simple conceptual framework to think about computing the relational join. Using this fr...
Data mining aims at discovering important and previously unknown patterns from the dataset in the un...
Automatic machine learning is a subfield of machine learning that automates the common procedures fa...
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Comput...
Techniques for identifying joinable or unionable tables in data lakes can yield valuable information...
Closer integration of machine learning (ML) with data processing is a booming area in both the data ...
We describe a method of inferring join plans for a set of relation instances, in the absence of any ...
Enterprise data analytics is a booming area in the data man-agement industry. Many companies are rac...
© 2020, VLDB Endowment. Automatic machine learning (AML) is a family of techniques to automate the p...
We study the problem of discovering joinable datasets at scale. We approach the problem from a learn...
In many data mining tools that support regression tasks, training data are stored in a single table ...
One fundamental limitation of classical statistical modeling is the assumption that data is represen...
The democratization of data science, and in particular of the machine learning pipeline, has focused...
With data becoming more and more complex, the standard tabular data format often does not suffice to...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
We present a simple conceptual framework to think about computing the relational join. Using this fr...
Data mining aims at discovering important and previously unknown patterns from the dataset in the un...
Automatic machine learning is a subfield of machine learning that automates the common procedures fa...
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Comput...
Techniques for identifying joinable or unionable tables in data lakes can yield valuable information...
Closer integration of machine learning (ML) with data processing is a booming area in both the data ...
We describe a method of inferring join plans for a set of relation instances, in the absence of any ...
Enterprise data analytics is a booming area in the data man-agement industry. Many companies are rac...
© 2020, VLDB Endowment. Automatic machine learning (AML) is a family of techniques to automate the p...
We study the problem of discovering joinable datasets at scale. We approach the problem from a learn...
In many data mining tools that support regression tasks, training data are stored in a single table ...
One fundamental limitation of classical statistical modeling is the assumption that data is represen...
The democratization of data science, and in particular of the machine learning pipeline, has focused...
With data becoming more and more complex, the standard tabular data format often does not suffice to...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
We present a simple conceptual framework to think about computing the relational join. Using this fr...
Data mining aims at discovering important and previously unknown patterns from the dataset in the un...