Compiler-parallelized applications are increasing in importance as moderate-scale multiprocessors become common. This paper evaluates how features of advancedmemory systems (e.g., longer cache lines) impact memory system behavior for applications amenable to compiler parallelization. Using full-sized input data sets and applications taken from the SPEC, NAS, PERFECT, and RICEPS benchmark suites, we measure statistics such as speedups, memory costs, causes of cache misses, cache line utilization, and data traffic. This exploration allows us to draw several conclusions. First, we find that larger granularity parallelism often correlates with good memory system behavior, good overall performance, and high speedup in these applications. Second,...
The distribution of resources among processors, memory and caches is a crucial question faced by des...
Automatic parallelizing compilers are often constrained in their transformations because they must c...
Memory bandwidth has become the performance bottleneck for memory intensive programs on modern proce...
Obtaining high performance without machine-specific tuning is an important goal of scientific applic...
Measurements of actual supercomputer cache performance has not been previously undertaken. PFC-Sim i...
In recent years the High Performance Computing (HPC) industry has benefited from the development of ...
Although caches in computers are invisible to programmers, the significantly affect programs� perfor...
this paper, we examine the relationship between these factors in the context of large-scale, network...
A wide variety of computer architectures have been proposed to exploit parallelism at different gran...
We have developed compiler algorithms that analyze coarse-grained, explicitly parallel programs and ...
In this paper we analyze the effect of compiler optimizations on fine grain parallelism in scalar pr...
The last decade has produced enormous improvements in processor speeds without a corresponding impro...
Applications with regular patterns of memory access can experience high levels of cache conflict mis...
This paper presents a model to evaluate the performance and overhead of parallelizing sequential cod...
This paper presents a multi-cache profiler for shared memory multiprocessor systems. For each progra...
The distribution of resources among processors, memory and caches is a crucial question faced by des...
Automatic parallelizing compilers are often constrained in their transformations because they must c...
Memory bandwidth has become the performance bottleneck for memory intensive programs on modern proce...
Obtaining high performance without machine-specific tuning is an important goal of scientific applic...
Measurements of actual supercomputer cache performance has not been previously undertaken. PFC-Sim i...
In recent years the High Performance Computing (HPC) industry has benefited from the development of ...
Although caches in computers are invisible to programmers, the significantly affect programs� perfor...
this paper, we examine the relationship between these factors in the context of large-scale, network...
A wide variety of computer architectures have been proposed to exploit parallelism at different gran...
We have developed compiler algorithms that analyze coarse-grained, explicitly parallel programs and ...
In this paper we analyze the effect of compiler optimizations on fine grain parallelism in scalar pr...
The last decade has produced enormous improvements in processor speeds without a corresponding impro...
Applications with regular patterns of memory access can experience high levels of cache conflict mis...
This paper presents a model to evaluate the performance and overhead of parallelizing sequential cod...
This paper presents a multi-cache profiler for shared memory multiprocessor systems. For each progra...
The distribution of resources among processors, memory and caches is a crucial question faced by des...
Automatic parallelizing compilers are often constrained in their transformations because they must c...
Memory bandwidth has become the performance bottleneck for memory intensive programs on modern proce...