Modern High Performance Computing (HPC) clusters often comprise a huge amount of computing resources of different capabilities, making them heterogeneous and difficult to manage. In addition, they must deal with a wide range of applications with different requirements. All this poses a great challenge to the workload managers that assign applications to resources. There are many new proposals to overcome this challenge, including some that employ Deep Reinforcement Learning (DRL) techniques. This paper proposes a novel simulation framework for the study of workload managers, that has been conceived to foster the study of workload managers based on DRL techniques. Its main features include the simulation of heterogeneous clusters based on mu...
AbstractMapReduce is presently established as an important distributed and parallel programming mode...
International audienceIn this paper we present a model and simulator for many clusters of heterogene...
Energy consumption in large-scale distributed systems, such as computational grids and clouds gains ...
Resource usage of production workloads running on shared compute clusters often fluctuate significan...
In the heterogeneous computing environment, programmers map the applications either on CPUs or GPUs....
International audienceDynamic scheduling of tasks in large-scale HPC platforms is normally accomplis...
Combinar l'aprenentatge per reforç amb l'aprenentatge profund és, a dia d'avui, un dels reptes més g...
The amount of data generated by computing clusters is very large, including nodes resources data or ...
We report on the improvements. that can be achieved by applying machine learning techniques, in part...
We report on the improvements that can be achieved by applying machine learning techniques, in parti...
In this paper we present a model and simulator for many clusters of heterogeneous PCs belonging to a...
The computational burden and the time required to train a deep reinforcement learning (DRL) can be a...
In recent years, energy consumption has become a limiting factor in the evolution of highperformance...
Large-scale machine learning models are routinely trained in a distributed fashion due to their incr...
In this paper we present a model and simulator for many clusters of heterogeneous PCs belonging to a...
AbstractMapReduce is presently established as an important distributed and parallel programming mode...
International audienceIn this paper we present a model and simulator for many clusters of heterogene...
Energy consumption in large-scale distributed systems, such as computational grids and clouds gains ...
Resource usage of production workloads running on shared compute clusters often fluctuate significan...
In the heterogeneous computing environment, programmers map the applications either on CPUs or GPUs....
International audienceDynamic scheduling of tasks in large-scale HPC platforms is normally accomplis...
Combinar l'aprenentatge per reforç amb l'aprenentatge profund és, a dia d'avui, un dels reptes més g...
The amount of data generated by computing clusters is very large, including nodes resources data or ...
We report on the improvements. that can be achieved by applying machine learning techniques, in part...
We report on the improvements that can be achieved by applying machine learning techniques, in parti...
In this paper we present a model and simulator for many clusters of heterogeneous PCs belonging to a...
The computational burden and the time required to train a deep reinforcement learning (DRL) can be a...
In recent years, energy consumption has become a limiting factor in the evolution of highperformance...
Large-scale machine learning models are routinely trained in a distributed fashion due to their incr...
In this paper we present a model and simulator for many clusters of heterogeneous PCs belonging to a...
AbstractMapReduce is presently established as an important distributed and parallel programming mode...
International audienceIn this paper we present a model and simulator for many clusters of heterogene...
Energy consumption in large-scale distributed systems, such as computational grids and clouds gains ...