AbstractSparse scientific codes face grave performance challenges as memory bandwidth limitations grow on multi-core architectures. We investigate the memory behavior of a key sparse scientific kernel and study model-driven performance evaluation in this scope. We propose the Coupled Reuse-Cache Model (CRC Model), to enable multilevel cache performance analysis of parallel sparse codes. Our approach builds separate probabilistic application and hardware models, which are coupled to discover unprecedented insight into software-hardware interactions in the cache hierarchy. We evaluate our model's predictive performance with the pervasive sparse matrix-vector product kernel, using 1 to 16 cores and multiple cache configurations. For multi-core...
Understanding multicore memory behavior is crucial, but can be challenging due to the cache hierarc...
We develop a reuse distance/stack distance based analytical modeling framework for efficient, online...
While there are many studies on the locality of dense codes, few deal with the locality of sparse co...
AbstractSparse scientific codes face grave performance challenges as memory bandwidth limitations gr...
International audienceThe increasing computation capability of servers comes with a dramatic increas...
Performance on multicore processors is determined largely by on-chip cache. Computer architects hav...
The context of this work are performance models of software systems, which are used for predicting p...
Multicore Reuse Distance (RD) analysis is a powerful tool that can potentially provide a parallel pr...
To increase performance, modern processors employ complex techniques such as out-of-order pipelines ...
This paper presents and validates methods to extend reuse distance analysis of application locality ...
The potential for improving the performance of data-intensive scientific programs by enhancing data ...
Performance metrics and models are prerequisites for scientific understanding and optimization. This...
HPC applications usually run at a low fraction of the computer's peak performance. Empirical perform...
. Many scientific applications handle compressed sparse matrices. Cache behavior during the executio...
Understanding multicore memory behavior is crucial, but can be challenging due to the complex cache ...
Understanding multicore memory behavior is crucial, but can be challenging due to the cache hierarc...
We develop a reuse distance/stack distance based analytical modeling framework for efficient, online...
While there are many studies on the locality of dense codes, few deal with the locality of sparse co...
AbstractSparse scientific codes face grave performance challenges as memory bandwidth limitations gr...
International audienceThe increasing computation capability of servers comes with a dramatic increas...
Performance on multicore processors is determined largely by on-chip cache. Computer architects hav...
The context of this work are performance models of software systems, which are used for predicting p...
Multicore Reuse Distance (RD) analysis is a powerful tool that can potentially provide a parallel pr...
To increase performance, modern processors employ complex techniques such as out-of-order pipelines ...
This paper presents and validates methods to extend reuse distance analysis of application locality ...
The potential for improving the performance of data-intensive scientific programs by enhancing data ...
Performance metrics and models are prerequisites for scientific understanding and optimization. This...
HPC applications usually run at a low fraction of the computer's peak performance. Empirical perform...
. Many scientific applications handle compressed sparse matrices. Cache behavior during the executio...
Understanding multicore memory behavior is crucial, but can be challenging due to the complex cache ...
Understanding multicore memory behavior is crucial, but can be challenging due to the cache hierarc...
We develop a reuse distance/stack distance based analytical modeling framework for efficient, online...
While there are many studies on the locality of dense codes, few deal with the locality of sparse co...