This paper provides a systematic comparison of various characteristics of computationally-intensive workloads. Our analysis focuses on standard HPC benchmarks and representative applications. For the selected workloads we provide a wide range of characterizations based on instruction tracing and hardware counter measurements. Each workload is analyzed at the instruction level by comparing the dynamic distribution of executed instructions. We also analyze memory access patterns including various aspects of cache utilization and locality properties of address distributions. Since prefetching plays an important role in the performance of computational workloads, we explore the prefetching potential and for parallel workloads we study the shari...
Several benchmarks for measuring memory performance of HPC systems along dimensions of spatial and t...
The increasing performance gap between processors and memory will force future architectures to devo...
This paper presents a framework for characterizing the distribution of fine-grained parallelism, dat...
The next-generation of supercomputers will feature a diverse mix of accelerator devices. The increas...
The analysis of workload traces from real production parallel machines can aid a wide variety of par...
Workload characterization has been proven an essential tool to architecture design and performance e...
Abstract. The performance of supercomputer schedulers is greatly af-fected by the characteristics of...
As the number of compute cores per chip continues to rise faster than the total amount of available ...
In high-performance computing (HPC) environments, an appropriate amount of hardware resources must b...
Accurate analysis of HPC storage system designs is contin-gent on the use of I/O workloads that are ...
Abstract—As detailed in recent reports, HPC architectures will continue to change over the next deca...
Having a representative work-load of the target domain of a microprocessor is extremely important th...
Several benchmarks for measuring memory performance of HPC systems along dimensions of spatial and t...
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-96983-1_10Des...
grantor: University of TorontoUnderstanding the characteristics of parallel workloads aids...
Several benchmarks for measuring memory performance of HPC systems along dimensions of spatial and t...
The increasing performance gap between processors and memory will force future architectures to devo...
This paper presents a framework for characterizing the distribution of fine-grained parallelism, dat...
The next-generation of supercomputers will feature a diverse mix of accelerator devices. The increas...
The analysis of workload traces from real production parallel machines can aid a wide variety of par...
Workload characterization has been proven an essential tool to architecture design and performance e...
Abstract. The performance of supercomputer schedulers is greatly af-fected by the characteristics of...
As the number of compute cores per chip continues to rise faster than the total amount of available ...
In high-performance computing (HPC) environments, an appropriate amount of hardware resources must b...
Accurate analysis of HPC storage system designs is contin-gent on the use of I/O workloads that are ...
Abstract—As detailed in recent reports, HPC architectures will continue to change over the next deca...
Having a representative work-load of the target domain of a microprocessor is extremely important th...
Several benchmarks for measuring memory performance of HPC systems along dimensions of spatial and t...
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-96983-1_10Des...
grantor: University of TorontoUnderstanding the characteristics of parallel workloads aids...
Several benchmarks for measuring memory performance of HPC systems along dimensions of spatial and t...
The increasing performance gap between processors and memory will force future architectures to devo...
This paper presents a framework for characterizing the distribution of fine-grained parallelism, dat...