To accelerate the inference of machine-learning (ML) model serving, clusters of machines require the use of expensive hardware accelerators (e.g., GPUs) to reduce execution time. Advanced inference serving systems are needed to satisfy latency service-level objectives (SLOs) in a cost-effective manner. Novel autoscaling mechanisms that greedily minimize the number of service instances while ensuring SLO compliance are helpful. However, we find that it is not adequate to guarantee cost effectiveness across heterogeneous GPU hardware, and this does not maximize resource utilization. In this paper, we propose HetSev to address these challenges by incorporating heterogeneity-aware autoscaling and resource-efficient scheduling to achieve cost ef...
Machine learning inference is increasingly being executed locally on mobile and embedded platforms, ...
GPU technology has been improving at an expedited pace in terms of size and performance, empowering ...
As many-core accelerators keep integrating more processing units, it becomes increasingly more diffi...
With the advent of ubiquitous deployment of smart devices and the Internet of Things, data sources f...
Our work seeks to improve and adapt computing systems and machine learning (ML) algorithms to match ...
A plethora of applications are using machine learning, the operations of which are becoming more com...
The use of machine learning (ML) inference for various applications is growing drastically. ML infer...
Large scale machine learning has many characteristics that can be exploited in the system designs to...
Heterogeneous computing systems provide high performance and energy efficiency. However, to optimall...
There is an increased interest in building machine learning frameworks with advanced algebraic capab...
International audienceWhile heterogeneous architectures are increasing popular with High Performance...
Modern deep learning systems like PyTorch and Tensorflow are able to train enormous models with bill...
As artificial intelligence (AI) and machine learning (ML) technologies disrupt a wide range of indus...
While machine learning (ML) has been widely used in real-life applications, the complex nature of re...
For dynamic and continuous data analysis, conventional OLTP systems are slow in performance. Today's...
Machine learning inference is increasingly being executed locally on mobile and embedded platforms, ...
GPU technology has been improving at an expedited pace in terms of size and performance, empowering ...
As many-core accelerators keep integrating more processing units, it becomes increasingly more diffi...
With the advent of ubiquitous deployment of smart devices and the Internet of Things, data sources f...
Our work seeks to improve and adapt computing systems and machine learning (ML) algorithms to match ...
A plethora of applications are using machine learning, the operations of which are becoming more com...
The use of machine learning (ML) inference for various applications is growing drastically. ML infer...
Large scale machine learning has many characteristics that can be exploited in the system designs to...
Heterogeneous computing systems provide high performance and energy efficiency. However, to optimall...
There is an increased interest in building machine learning frameworks with advanced algebraic capab...
International audienceWhile heterogeneous architectures are increasing popular with High Performance...
Modern deep learning systems like PyTorch and Tensorflow are able to train enormous models with bill...
As artificial intelligence (AI) and machine learning (ML) technologies disrupt a wide range of indus...
While machine learning (ML) has been widely used in real-life applications, the complex nature of re...
For dynamic and continuous data analysis, conventional OLTP systems are slow in performance. Today's...
Machine learning inference is increasingly being executed locally on mobile and embedded platforms, ...
GPU technology has been improving at an expedited pace in terms of size and performance, empowering ...
As many-core accelerators keep integrating more processing units, it becomes increasingly more diffi...