The monitoring and system analysis of high performance computing (HPC) clusters is of increasing importance to the HPC community. Analysis of HPC job data can be used to characterize system usage and diagnose and examine failure modes and their effects. This analysis is not straightforward, however, due to the complex relationships that exist between jobs. These relationships are based on a number of factors, including shared compute nodes between jobs, proximity of jobs in time, etc. Graph-based techniques represent an approach that is particularly well suited to this problem, and provide an effective technique for discovering important relationships in job queuing and execution data. The efficacy of these techniques is rooted in the use o...
High performance computing (HPC) scheduling landscape currently faces new challenges due to the chan...
MapReduce is a programming paradigm for parallel processing that is increasingly being used for data...
The dataset in the tarball was used as job- and power-trace input for the paper "What does Power Con...
High-Performance Computing (HPC) systems need to be constantly monitored to ensure their stability. ...
Master of ScienceDepartment of Computer ScienceWilliam H. HsuThis thesis addresses the task of analy...
Hundreds of petabytes of experimental data in high energy and nuclear physics (HENP) have already be...
Node downtime and failed jobs in a computing cluster translate into wasted resources and user dissat...
Hundreds of petabytes of experimental data in high energy and nuclear physics (HENP) have already be...
Performance analysis is an essential task in high-performance computing (HPC) systems, and it is app...
Les rapports de recherche du LIG - ISSN: 2105-0422In HPC community the System Utilization metric ena...
International audienceVisualization strategies are a valuable tool in the performance evaluation of ...
Advisors: Michael Papka.Committee members: Kirk Duffin; Nicholas Karonis.The routine execution of jo...
Hundreds of petabytes of experimental data in high energy and nuclear physics (HENP) have been colle...
Large high-performance computing systems are built with increasing number of components with more CP...
As multicore architectures become mainstream, an in-depth understanding of how applications behave o...
High performance computing (HPC) scheduling landscape currently faces new challenges due to the chan...
MapReduce is a programming paradigm for parallel processing that is increasingly being used for data...
The dataset in the tarball was used as job- and power-trace input for the paper "What does Power Con...
High-Performance Computing (HPC) systems need to be constantly monitored to ensure their stability. ...
Master of ScienceDepartment of Computer ScienceWilliam H. HsuThis thesis addresses the task of analy...
Hundreds of petabytes of experimental data in high energy and nuclear physics (HENP) have already be...
Node downtime and failed jobs in a computing cluster translate into wasted resources and user dissat...
Hundreds of petabytes of experimental data in high energy and nuclear physics (HENP) have already be...
Performance analysis is an essential task in high-performance computing (HPC) systems, and it is app...
Les rapports de recherche du LIG - ISSN: 2105-0422In HPC community the System Utilization metric ena...
International audienceVisualization strategies are a valuable tool in the performance evaluation of ...
Advisors: Michael Papka.Committee members: Kirk Duffin; Nicholas Karonis.The routine execution of jo...
Hundreds of petabytes of experimental data in high energy and nuclear physics (HENP) have been colle...
Large high-performance computing systems are built with increasing number of components with more CP...
As multicore architectures become mainstream, an in-depth understanding of how applications behave o...
High performance computing (HPC) scheduling landscape currently faces new challenges due to the chan...
MapReduce is a programming paradigm for parallel processing that is increasingly being used for data...
The dataset in the tarball was used as job- and power-trace input for the paper "What does Power Con...