The performance and energy efficiency of modern architectures depend on memory locality, which can be improved by thread and data mappings considering the memory access behavior of parallel applications. In this paper, we propose IPM, a mechanism that analyzes the memory access behavior using information about the time the entry of each page resides in the Translation Lookaside Buffer (TLB). It provides very accurate information with a very low overhead. We present experimental results with simulation and real machines, with average performance improvements of 13.7% and energy savings of 4.4%, which come from reductions in cache misses and interconnection traffic
Journal ArticleConventional microarchitectures choose a single memory hierarchy design point target...
Reducing the cost of memory accesses, both in terms of performance and energy consumption, is a majo...
Journal ArticleIn future multi-cores, large amounts of delay and power will be spent accessing data...
The performance and energy efficiency of modern architectures depend on memory locality, which can b...
The performance and energy efficiency of modern architectures depend on memory locality, which can b...
International audienceCurrent and future architectures rely on thread-level parallelism to sustain p...
International audienceThe parallelism in shared-memory systems has increased significantly with the ...
The complexity of an efficient thread management steadily rises with the number of processor cores a...
As thread-level parallelism increases in modern architectures due to larger numbers of cores per chi...
Thesis (Ph. D.)--University of Rochester. Department of Computer Science, 2017On modern processors, ...
International audienceEfficiently programming shared-memory machines is a difficult challenge becaus...
This thesis studies the use of software methods to improve memory performance in a heterogeneous cac...
In this thesis, we propose and evaluate several techniques to dynamically increase the memory access...
This paper presents COMPROF and COMPLACE, a novel profiling tool and thread placement technique for ...
Data mining is the process of extracting useful information or patterns from large raw sets of data....
Journal ArticleConventional microarchitectures choose a single memory hierarchy design point target...
Reducing the cost of memory accesses, both in terms of performance and energy consumption, is a majo...
Journal ArticleIn future multi-cores, large amounts of delay and power will be spent accessing data...
The performance and energy efficiency of modern architectures depend on memory locality, which can b...
The performance and energy efficiency of modern architectures depend on memory locality, which can b...
International audienceCurrent and future architectures rely on thread-level parallelism to sustain p...
International audienceThe parallelism in shared-memory systems has increased significantly with the ...
The complexity of an efficient thread management steadily rises with the number of processor cores a...
As thread-level parallelism increases in modern architectures due to larger numbers of cores per chi...
Thesis (Ph. D.)--University of Rochester. Department of Computer Science, 2017On modern processors, ...
International audienceEfficiently programming shared-memory machines is a difficult challenge becaus...
This thesis studies the use of software methods to improve memory performance in a heterogeneous cac...
In this thesis, we propose and evaluate several techniques to dynamically increase the memory access...
This paper presents COMPROF and COMPLACE, a novel profiling tool and thread placement technique for ...
Data mining is the process of extracting useful information or patterns from large raw sets of data....
Journal ArticleConventional microarchitectures choose a single memory hierarchy design point target...
Reducing the cost of memory accesses, both in terms of performance and energy consumption, is a majo...
Journal ArticleIn future multi-cores, large amounts of delay and power will be spent accessing data...