We consider the problem of computing machine learning models over multi-relational databases. The mainstream approach involves a costly repeated loop that data scientists have to deal with on a daily basis: select features from data residing in relational databases using feature extraction queries involving joins, projections, and aggregations; export the training dataset defined by such queries; convert this dataset into the format of an external learning tool; and train the desired model using this tool. In this thesis, we advocate for an alternative approach that avoids this loops and instead tightly integrates the query and learning tasks into one unified solution. The primary observation is that the data-intensive computation for a va...
Which doctors prescribe which drugs to which patients? Who upvotes which answers on what topics on Q...
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Comput...
Machine learning (ML) pipelines for model training and validation typically include preprocessing, s...
In this talk, I will make the case for a first-principles approach to machine learning over relation...
This paper introduces LMFAO (Layered Multiple Functional Aggregate Optimization), an in-memory optim...
Integrated solutions for analytics over relational databases are of great practical importance as th...
In the big data era, the use of large-scale machine learning methods is becoming ubiquitous in data ...
Query optimization is crucial for any data management system to achieve good performance. Recent adv...
Multi-relational data mining algorithms search a large hypothesis space in order to find a suitable ...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
The primary difference between propositional (attribute-value) and relational data is the existence ...
In this paper we study the issue of how to scale machine learning algorithms, that typically are des...
Statistical relational learning techniques have been successfully applied in a wide range of relatio...
Enterprise data analytics is a booming area in the data man-agement industry. Many companies are rac...
Many machine learning applications that involve relational databases incorporate first-order logic a...
Which doctors prescribe which drugs to which patients? Who upvotes which answers on what topics on Q...
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Comput...
Machine learning (ML) pipelines for model training and validation typically include preprocessing, s...
In this talk, I will make the case for a first-principles approach to machine learning over relation...
This paper introduces LMFAO (Layered Multiple Functional Aggregate Optimization), an in-memory optim...
Integrated solutions for analytics over relational databases are of great practical importance as th...
In the big data era, the use of large-scale machine learning methods is becoming ubiquitous in data ...
Query optimization is crucial for any data management system to achieve good performance. Recent adv...
Multi-relational data mining algorithms search a large hypothesis space in order to find a suitable ...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
The primary difference between propositional (attribute-value) and relational data is the existence ...
In this paper we study the issue of how to scale machine learning algorithms, that typically are des...
Statistical relational learning techniques have been successfully applied in a wide range of relatio...
Enterprise data analytics is a booming area in the data man-agement industry. Many companies are rac...
Many machine learning applications that involve relational databases incorporate first-order logic a...
Which doctors prescribe which drugs to which patients? Who upvotes which answers on what topics on Q...
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Comput...
Machine learning (ML) pipelines for model training and validation typically include preprocessing, s...