Machine Learning models are often composed of pipelines of transformations. While this design allows to efficiently execute single model components at training-time, prediction serving has different requirements such as low latency, high throughput and graceful performance degradation under heavy load. Current prediction serving systems consider models as black boxes, whereby prediction-time-specific optimizations are ignored in favor of ease of deployment. In this paper, we present PRETZEL, a prediction serving system introducing a novel white box architecture enabling both end-to-end and multi-model optimizations. Using production-like model pipelines, our experiments show that PRETZEL is able to introduce performance improvements over di...
After a decade of accelerated progress in the different areas of machine learning (ML), it has becom...
Time series forecasting has attracted the attention of the machine learning (ML) community to produc...
To serve machine learning requests with trained models plays an increasingly important role with the...
Machine Learning models are often composed of pipelines of transformations. While this design allows...
Machine learning is being deployed in a growing number of applications which demand real- time, accu...
Machine Learning models are often composed by sequences of transformations. While this design makes ...
Tree-based models have proven to be an effective solution for web ranking as well as other problems ...
We identified the specific predictors we will be using: • Stride Based: A low latency predictor [5] ...
The cost efficiency of model inference is critical to real-world machine learning (ML) applications,...
Machine learning practitioners often face a fundamental trade-off between expressiveness and computa...
Machine learning (ML) pipelines for model training and validation typically include preprocessing, s...
Machine learning (ML) is now commonplace, powering data-driven applications in various organizations...
Abstract—Modern processors are equipped with multiple hardware prefetchers, each of which targets a ...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
The solutions to many problems in computer architecture involve predictions, which are often based o...
After a decade of accelerated progress in the different areas of machine learning (ML), it has becom...
Time series forecasting has attracted the attention of the machine learning (ML) community to produc...
To serve machine learning requests with trained models plays an increasingly important role with the...
Machine Learning models are often composed of pipelines of transformations. While this design allows...
Machine learning is being deployed in a growing number of applications which demand real- time, accu...
Machine Learning models are often composed by sequences of transformations. While this design makes ...
Tree-based models have proven to be an effective solution for web ranking as well as other problems ...
We identified the specific predictors we will be using: • Stride Based: A low latency predictor [5] ...
The cost efficiency of model inference is critical to real-world machine learning (ML) applications,...
Machine learning practitioners often face a fundamental trade-off between expressiveness and computa...
Machine learning (ML) pipelines for model training and validation typically include preprocessing, s...
Machine learning (ML) is now commonplace, powering data-driven applications in various organizations...
Abstract—Modern processors are equipped with multiple hardware prefetchers, each of which targets a ...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
The solutions to many problems in computer architecture involve predictions, which are often based o...
After a decade of accelerated progress in the different areas of machine learning (ML), it has becom...
Time series forecasting has attracted the attention of the machine learning (ML) community to produc...
To serve machine learning requests with trained models plays an increasingly important role with the...