In this paper we introduce a methodology for dynamic job reconfiguration driven by the programming model runtime in collaboration with the global resource manager. We improve the system throughput by exploiting malleability techniques (in terms of number of MPI ranks) through the reallocation of resources assigned to a job during its execution. In our proposal, the OmpSs runtime reconfigures the number of MPI ranks during the execution of an application in cooperation with the Slurm workload manager. In addition, we take advantage of OmpSs offload semantics to allow application developers deal with data redistribution. By combining these elements a job is able to expand itself in order to exploit idle nodes or be shrunk if other queued jobs...
Load imbalance is a long-standing source of inefficiency in high performance computing. The situati...
As new heterogeneous systems and hardware accelerators appear, high performance computers can reach ...
This paper discusses the need for resource management support for parallel applications running on w...
In this paper we introduce a methodology for dynamic job reconfiguration driven by the programming m...
Adaptive workloads can change on–the–fly the configuration of their jobs, in terms of number of pro...
Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)...
In the design of future HPC systems, research in resource management is showing an increasing intere...
Adaptive workloads can change on–the–fly the configuration of their jobs, in terms of number of proc...
Process malleability has proved to have a highly positive impact on the resource utilization and glo...
Several studies have proved the benefits of job malleability, that is, the capacity of an applicatio...
Applications in science and engineering often require huge computational resources for solving probl...
National audienceCurrent parallel architectures take advantage of new hardware evolution, like the u...
The work in this paper focuses on providing malleability to MPI applications by using a novel perfor...
This work focuses on scheduling of MPI jobs when executing in shared-memory multiprocessors (SMPs). ...
International audienceThe ever-increasing supercomputer architectural complexity emphasizes the need...
Load imbalance is a long-standing source of inefficiency in high performance computing. The situati...
As new heterogeneous systems and hardware accelerators appear, high performance computers can reach ...
This paper discusses the need for resource management support for parallel applications running on w...
In this paper we introduce a methodology for dynamic job reconfiguration driven by the programming m...
Adaptive workloads can change on–the–fly the configuration of their jobs, in terms of number of pro...
Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)...
In the design of future HPC systems, research in resource management is showing an increasing intere...
Adaptive workloads can change on–the–fly the configuration of their jobs, in terms of number of proc...
Process malleability has proved to have a highly positive impact on the resource utilization and glo...
Several studies have proved the benefits of job malleability, that is, the capacity of an applicatio...
Applications in science and engineering often require huge computational resources for solving probl...
National audienceCurrent parallel architectures take advantage of new hardware evolution, like the u...
The work in this paper focuses on providing malleability to MPI applications by using a novel perfor...
This work focuses on scheduling of MPI jobs when executing in shared-memory multiprocessors (SMPs). ...
International audienceThe ever-increasing supercomputer architectural complexity emphasizes the need...
Load imbalance is a long-standing source of inefficiency in high performance computing. The situati...
As new heterogeneous systems and hardware accelerators appear, high performance computers can reach ...
This paper discusses the need for resource management support for parallel applications running on w...