The Deep Learning (DL) paradigm gained remarkable popularity in recent years. DL models are used to tackle increasingly complex problems, making the training process require considerable computational power. The parallel computing capabilities offered by modern GPUs partially fulfill this need, but the high costs related to GPU as a Service solutions in the cloud call for efficient capacity planning and job scheduling algorithms to reduce operational costs via resource sharing. In this work, we jointly address the online capacity planning and job scheduling problems from the perspective of cloud end-users. We present a Mixed Integer Linear Programming (MILP) formulation, and a path relinking-based method aiming at optimizing operational cos...
Cloud computing is an emerging technology that is increasingly being appreciated for its diverse use...
This work is focused on the issue of job scheduling in a high performance computing systems. The goa...
The explosion of data has transformed the world since much more information is available for collect...
The Deep Learning (DL) paradigm gained remarkable popularity in recent years. DL models are used to ...
Deep Learning (DL) methods currently address a variety of complex tasks. GPUs significantly accelera...
Recent advances in hardware, such as systems with multiple GPUs and their availability in the cloud,...
To accelerate the training of Deep Learning (DL) models, clusters of machines equipped with hardware...
Deep learning (DL) training jobs bring some unique challenges to existing cluster managers, such as ...
Artificial Intelligence (AI) and Deep Learning (DL) algorithms are currently applied to a wide range...
peer reviewedTraining large neural networks with huge amount of data using multiple Graphic Processi...
DL has pervaded many areas of computing due to the confluence of the explosive growth of large-scale...
Deep learning (DL) training jobs now constitute a large portion of the jobs in the GPU clusters. Fol...
Scheduling involves allocating shared resources gradually so that tasks can be completed within a pr...
The advent of deep learning has completely reshaped our world. Now, our daily life is fulfilled with...
Cloud computing refers to services that run in a distributed network and are accessible through comm...
Cloud computing is an emerging technology that is increasingly being appreciated for its diverse use...
This work is focused on the issue of job scheduling in a high performance computing systems. The goa...
The explosion of data has transformed the world since much more information is available for collect...
The Deep Learning (DL) paradigm gained remarkable popularity in recent years. DL models are used to ...
Deep Learning (DL) methods currently address a variety of complex tasks. GPUs significantly accelera...
Recent advances in hardware, such as systems with multiple GPUs and their availability in the cloud,...
To accelerate the training of Deep Learning (DL) models, clusters of machines equipped with hardware...
Deep learning (DL) training jobs bring some unique challenges to existing cluster managers, such as ...
Artificial Intelligence (AI) and Deep Learning (DL) algorithms are currently applied to a wide range...
peer reviewedTraining large neural networks with huge amount of data using multiple Graphic Processi...
DL has pervaded many areas of computing due to the confluence of the explosive growth of large-scale...
Deep learning (DL) training jobs now constitute a large portion of the jobs in the GPU clusters. Fol...
Scheduling involves allocating shared resources gradually so that tasks can be completed within a pr...
The advent of deep learning has completely reshaped our world. Now, our daily life is fulfilled with...
Cloud computing refers to services that run in a distributed network and are accessible through comm...
Cloud computing is an emerging technology that is increasingly being appreciated for its diverse use...
This work is focused on the issue of job scheduling in a high performance computing systems. The goa...
The explosion of data has transformed the world since much more information is available for collect...