International audienceA Resource and Job Management System (RJMS) is a crucial system software part of the HPC stack. It is responsible for eciently delivering computing power to applications in supercomputing environments. Its main intelligence relies on resource selection techniques to find the most adapted resources to schedule the users' jobs. This paper introduces a new method that takes into account the topology of the machine and the application characteristics to determine the best choice among the available nodes of the platform, based upon the network topology and taking into account the applications communication pattern. To validate our approach, we integrate this algorithm as a plugin for Slurm, a well-known and widespread RJMS...
International audienceThe evolution of massively parallel supercomputers make palpable two issues in...
Abstract — With the exponentially growth of distributed computing systems in both flops and cores, s...
International audienceInterconnection networks in parallel platforms can be made of thousands of nod...
International audienceThe Resource and Job Management System (RJMS) is a crucial system software par...
SLURM is a popular resource management system that is used on many supercomputers in the TOP500 list...
Abstract. The Resource and Job Management System (RJMS) is the middleware in charge of de-livering c...
International audienceProcess mapping (or process placement) is a useful algorithmic technique to op...
peer reviewedHigh Performance Computing (HPC) is nowadays a strategic asset required to sustain the ...
International audienceConsidering the large number of processors and the size of the interconnection...
International audienceThe increasing complexity of parallel computing platforms requires a deep know...
Petascale machines with hundreds of thousands of cores are being built. These machines have varying ...
To be held in conjunction with SC21International audienceProcessor architectures at exascale and bey...
Abhinav Bhatele, Ph.D. student at the Parallel Programming Lab at the University of Illinois present...
International audienceEfficiently programming shared-memory machines is a difficult challenge becaus...
Parallel computing platforms are increasingly complex, with multiple cores, shared caches, and NUMA ...
International audienceThe evolution of massively parallel supercomputers make palpable two issues in...
Abstract — With the exponentially growth of distributed computing systems in both flops and cores, s...
International audienceInterconnection networks in parallel platforms can be made of thousands of nod...
International audienceThe Resource and Job Management System (RJMS) is a crucial system software par...
SLURM is a popular resource management system that is used on many supercomputers in the TOP500 list...
Abstract. The Resource and Job Management System (RJMS) is the middleware in charge of de-livering c...
International audienceProcess mapping (or process placement) is a useful algorithmic technique to op...
peer reviewedHigh Performance Computing (HPC) is nowadays a strategic asset required to sustain the ...
International audienceConsidering the large number of processors and the size of the interconnection...
International audienceThe increasing complexity of parallel computing platforms requires a deep know...
Petascale machines with hundreds of thousands of cores are being built. These machines have varying ...
To be held in conjunction with SC21International audienceProcessor architectures at exascale and bey...
Abhinav Bhatele, Ph.D. student at the Parallel Programming Lab at the University of Illinois present...
International audienceEfficiently programming shared-memory machines is a difficult challenge becaus...
Parallel computing platforms are increasingly complex, with multiple cores, shared caches, and NUMA ...
International audienceThe evolution of massively parallel supercomputers make palpable two issues in...
Abstract — With the exponentially growth of distributed computing systems in both flops and cores, s...
International audienceInterconnection networks in parallel platforms can be made of thousands of nod...