The allocation of jobs to nodes and cores in industrial clusters is often based on queue-system standard settings, guesses or perceived fairness between different users and projects. Unfortunately, hard empirical data is often lacking and jobs are scheduled and co-scheduled for no apparent reason. In this case-study, we evaluate the performance impact of co-scheduling jobs using three types of applications and an existing 450+ node cluster at a company doing large-scale parallel industrial simulations. We measure the speedup when co-scheduling two applications together, sharing two nodes, compared to running the applications on separate nodes. Our results and analyses show that by enabling co-scheduling we improve performance in the order o...
Due to copyright restrictions, the access to the full text of this article is only available via sub...
Modern high-performance computing (HPC) system designs have converged to heavyweight nodes with grow...
Recent emerging applications from a wide range of scientific domains often require a very large numb...
The allocation of jobs to nodes and cores in industrial clusters is often based on queue-system stan...
We introduce a methodology for the study of the application-level performance of time-sharing parall...
scheduling In this paper, we utilize a bandwidth-centric job communication model that captures the i...
In a multicore processor system, running multiple applications on different cores in the same chip c...
Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüftAbweichender Titel nach Übersetz...
In recent years, the number of processing units per compute node has been increasing. In order to ut...
In systems consisting of multiple clusters of processors which are interconnected by relatively slow...
A dedicated cluster is often not fully utilized even when all of its processors are allocated to job...
It is common that multiple cores reside on the same chip and share the on-chip cache. As a result, r...
In systems consisting of multiple clusters of processors interconnected by relatively slow network c...
Over the last decade, much research in the area of scheduling has concentrated on single cluster sys...
International audienceThis paper investigates co-scheduling algorithms for processing a set of paral...
Due to copyright restrictions, the access to the full text of this article is only available via sub...
Modern high-performance computing (HPC) system designs have converged to heavyweight nodes with grow...
Recent emerging applications from a wide range of scientific domains often require a very large numb...
The allocation of jobs to nodes and cores in industrial clusters is often based on queue-system stan...
We introduce a methodology for the study of the application-level performance of time-sharing parall...
scheduling In this paper, we utilize a bandwidth-centric job communication model that captures the i...
In a multicore processor system, running multiple applications on different cores in the same chip c...
Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüftAbweichender Titel nach Übersetz...
In recent years, the number of processing units per compute node has been increasing. In order to ut...
In systems consisting of multiple clusters of processors which are interconnected by relatively slow...
A dedicated cluster is often not fully utilized even when all of its processors are allocated to job...
It is common that multiple cores reside on the same chip and share the on-chip cache. As a result, r...
In systems consisting of multiple clusters of processors interconnected by relatively slow network c...
Over the last decade, much research in the area of scheduling has concentrated on single cluster sys...
International audienceThis paper investigates co-scheduling algorithms for processing a set of paral...
Due to copyright restrictions, the access to the full text of this article is only available via sub...
Modern high-performance computing (HPC) system designs have converged to heavyweight nodes with grow...
Recent emerging applications from a wide range of scientific domains often require a very large numb...