Improved resource utilization and fault tolerance of large-scale HPC systems can be achieved through fine-grained, intelligent, and dynamic resource (re)allocation. We explore components and enabling technologies applicable to creating a system to provide this capability: specifically 1) Scalable fine-grained monitoring and analysis to inform resource allocation decisions, 2) Virtualization to enable dynamic reconfiguration, 3) Resource management for the combined physical and virtual resources and 4) Orchestration of the allocation, evaluation, and balancing of resources in a dynamic environment. We discuss both general and HPC-centric issues that impact the design of such a system. Finally, we present our prototype system, giving both des...
This report provides documentation for the completion of the Sandia Level II milestone 'Develop feed...
Resource management is a well known problem in almost every computing system ranging from embedded t...
In order to get more results or greater accuracy, computational scientists execute mainly parallel o...
This dissertation explores the viability of virtualization techniques to address the challenges that...
High-Performance Computing (HPC) is rapidly moving towards the adoption of nodes characterized by a...
2021 Summer.Includes bibliographical references.The need for high performance computing (HPC) resour...
To sustain performance while facing always tighter power and energy envelopes, High Performance Comp...
For the execution of the scientific applications, different methods have been proposed to dynamicall...
Platform virtualization helps solving major grid computing challenges: share resource with flexible,...
The primary motivation for uptake of virtualization has been resource isolation, capacity management...
This work aims to achieve better management of physical resources by dynamically reallocating and ad...
Heterogeneous System Architectures (HSA) are gaining importance in the High Performance Computing (H...
The primary motivation for uptake of virtualization has been resource isolation, capacity management...
In this report we demonstrate the potential utility of re-source allocation management systems that ...
This report provides documentation for the completion of the Sandia Level II milestone 'Develop feed...
Resource management is a well known problem in almost every computing system ranging from embedded t...
In order to get more results or greater accuracy, computational scientists execute mainly parallel o...
This dissertation explores the viability of virtualization techniques to address the challenges that...
High-Performance Computing (HPC) is rapidly moving towards the adoption of nodes characterized by a...
2021 Summer.Includes bibliographical references.The need for high performance computing (HPC) resour...
To sustain performance while facing always tighter power and energy envelopes, High Performance Comp...
For the execution of the scientific applications, different methods have been proposed to dynamicall...
Platform virtualization helps solving major grid computing challenges: share resource with flexible,...
The primary motivation for uptake of virtualization has been resource isolation, capacity management...
This work aims to achieve better management of physical resources by dynamically reallocating and ad...
Heterogeneous System Architectures (HSA) are gaining importance in the High Performance Computing (H...
The primary motivation for uptake of virtualization has been resource isolation, capacity management...
In this report we demonstrate the potential utility of re-source allocation management systems that ...
This report provides documentation for the completion of the Sandia Level II milestone 'Develop feed...
Resource management is a well known problem in almost every computing system ranging from embedded t...
In order to get more results or greater accuracy, computational scientists execute mainly parallel o...