This work presents a HPC framework that provides new strategies for resource management and job scheduling, based on executing different applications in shared compute nodes, maximizing platform utilization. The framework includes a scalable monitoring tool that is able to analyze the platform's compute node utilization. We also introduce an extension of CLARISSE, a middleware for data-staging coordination and control on large-scale HPC platforms that uses the information provided by the monitor in combination with application-level analysis to detect performance degradation in the running applications. This degradation, caused by the fact that the applications share the compute nodes and may compete for their resources, is avoided by means...
The adoption of graphic processor units (GPU) in high-performance computing (HPC) infrastructures de...
To help shrink the programmability-performance efficiency gap, we discuss that adaptive runtime syst...
Application performance can degrade significantly due to node-local load imbalances during applicati...
This work presents a HPC framework that provides new strategies for resource management and job sche...
Next generation HPC applications will increasingly time-share system resources with emerging workloa...
Individual processor frequencies have reached an upper physical and practical limit. Processor desig...
Many breakthroughs in scientific and industrial research are supported by simulations and calculatio...
This work presents a common framework that integrates CLARISSE, a cross-layer runtime for the I/O so...
The field of High Performance Computing (HPC) is characterized by the continuous evolution of comput...
International audienceThe scheduling of parallel tasks is a topic that has received a lot of attenti...
peer reviewedThe scheduling of parallel tasks is a topic that has received a lot of attention in rec...
Network interference of nearby jobs has been recently identified as the dominant reason for the high...
The field of High Performance Computing (HPC) is characterized by the contin-uous evolution of compu...
In HPC platforms, concurrent applications are sharing the same file system. This can lead to conflic...
International audienceProgramming paradigms in High-Performance Computing have been shifting towards...
The adoption of graphic processor units (GPU) in high-performance computing (HPC) infrastructures de...
To help shrink the programmability-performance efficiency gap, we discuss that adaptive runtime syst...
Application performance can degrade significantly due to node-local load imbalances during applicati...
This work presents a HPC framework that provides new strategies for resource management and job sche...
Next generation HPC applications will increasingly time-share system resources with emerging workloa...
Individual processor frequencies have reached an upper physical and practical limit. Processor desig...
Many breakthroughs in scientific and industrial research are supported by simulations and calculatio...
This work presents a common framework that integrates CLARISSE, a cross-layer runtime for the I/O so...
The field of High Performance Computing (HPC) is characterized by the continuous evolution of comput...
International audienceThe scheduling of parallel tasks is a topic that has received a lot of attenti...
peer reviewedThe scheduling of parallel tasks is a topic that has received a lot of attention in rec...
Network interference of nearby jobs has been recently identified as the dominant reason for the high...
The field of High Performance Computing (HPC) is characterized by the contin-uous evolution of compu...
In HPC platforms, concurrent applications are sharing the same file system. This can lead to conflic...
International audienceProgramming paradigms in High-Performance Computing have been shifting towards...
The adoption of graphic processor units (GPU) in high-performance computing (HPC) infrastructures de...
To help shrink the programmability-performance efficiency gap, we discuss that adaptive runtime syst...
Application performance can degrade significantly due to node-local load imbalances during applicati...