The performance and energy efficiency of modern architectures depend on memory locality, which can be improved by thread and data mappings considering the memory access behavior of parallel applications. In this paper, we propose IPM, a mechanism that analyzes the memory access behavior using information about the time the entry of each page resides in the Translation Lookaside Buffer (TLB). It provides very accurate information with a very low overhead. We present experimental results with simulation and real machines, with average performance improvements of 13.7% and energy savings of 4.4%, which come from reductions in cache misses and interconnection traffic
Many-core architectures are excellent in hiding memory-access latency by low-overhead context switch...
Abstract—Optimizing memory access is critical for perfor-mance and power efficiency. CPU manufacture...
Address translation is an essential part of current systems. Getting the virtual-to-physical mapping...
The performance and energy efficiency of modern architectures depend on memory locality, which can b...
The performance and energy efficiency of modern architectures depend on memory locality, which can b...
International audienceCurrent and future architectures rely on thread-level parallelism to sustain p...
International audienceThe parallelism in shared-memory systems has increased significantly with the ...
The complexity of an efficient thread management steadily rises with the number of processor cores a...
We present a completely new kind of approach for mapping the computation of an application to MP-SOC...
Many-core processors are becoming mainstream computing platforms nowadays. How to map the applicatio...
International audienceThread mapping has been extensively used as a technique to efficiently exploit...
Data mining is the process of extracting useful information or patterns from large raw sets of data....
Thesis (Ph. D.)--University of Rochester. Department of Computer Science, 2017On modern processors, ...
In today's multi-core systems, cache contention due to true and false sharing can cause unexpected a...
AbstractA number of highly-threaded, many-core architectures hide memory-access latency by low-overh...
Many-core architectures are excellent in hiding memory-access latency by low-overhead context switch...
Abstract—Optimizing memory access is critical for perfor-mance and power efficiency. CPU manufacture...
Address translation is an essential part of current systems. Getting the virtual-to-physical mapping...
The performance and energy efficiency of modern architectures depend on memory locality, which can b...
The performance and energy efficiency of modern architectures depend on memory locality, which can b...
International audienceCurrent and future architectures rely on thread-level parallelism to sustain p...
International audienceThe parallelism in shared-memory systems has increased significantly with the ...
The complexity of an efficient thread management steadily rises with the number of processor cores a...
We present a completely new kind of approach for mapping the computation of an application to MP-SOC...
Many-core processors are becoming mainstream computing platforms nowadays. How to map the applicatio...
International audienceThread mapping has been extensively used as a technique to efficiently exploit...
Data mining is the process of extracting useful information or patterns from large raw sets of data....
Thesis (Ph. D.)--University of Rochester. Department of Computer Science, 2017On modern processors, ...
In today's multi-core systems, cache contention due to true and false sharing can cause unexpected a...
AbstractA number of highly-threaded, many-core architectures hide memory-access latency by low-overh...
Many-core architectures are excellent in hiding memory-access latency by low-overhead context switch...
Abstract—Optimizing memory access is critical for perfor-mance and power efficiency. CPU manufacture...
Address translation is an essential part of current systems. Getting the virtual-to-physical mapping...