Ensuring the continuous scaling of parallel applications is challenging on many-core processors, due to the complex relationship of available parallelism in application and the limited shared on-chip resources. Two main bottlenecks that limit the scalability of parallel applications are synchronization and memory bandwidth. With this thesis, I proposed MiSAR, a minimalistic synchronization accelerator (MSA) that supports all three commonly used synchronization (locks, barriers, and condition variables), and a novel overflow management unit (OMU) that dynamically manages its (very) limited hardware synchronization resources. The OMU allows safe and efficient dynamic transitions between using hardware (MSA) and software synchronization implem...
International audienceEstimating the potential performance of parallel applicationson the yet-to-be-...
<p>Heterogeneous processors with accelerators provide an opportunity to improve performance within a...
Supercomputers are used to solve some of the world’s most computationally demanding problems. Exasc...
Single chip multicore processors are now prevalent and processors with hundreds of cores are being p...
The objective of this work is to investigate the algorithm design and the programming model of mult...
Multi-threaded workloads typically show sublinear speedup on multi-core hardware, i.e., the achieved...
Performance analysis tools are essential to the maintenance of efficient parallel execution of scien...
Multi-core processors have become the dominant processor architecture with 2, 4, and 8 cores on a ch...
peer-reviewedThe shift towards multicore processing has led to a much wider population of developer...
In multicores, performance-critical synchronization is increasingly performed in a lock-free manner ...
Faced with nearly stagnant clock speed advances, chip manufacturers have turned to parallelism as th...
This paper reviews some important issues for scalability\ud in programming and future trend with man...
International audience—Estimating the potential performance of parallel applications on the yet-to-b...
AbstractA number of highly-threaded, many-core architectures hide memory-access latency by low-overh...
Over the past decade, multicore machines have become the norm. A single machine is capable of having...
International audienceEstimating the potential performance of parallel applicationson the yet-to-be-...
<p>Heterogeneous processors with accelerators provide an opportunity to improve performance within a...
Supercomputers are used to solve some of the world’s most computationally demanding problems. Exasc...
Single chip multicore processors are now prevalent and processors with hundreds of cores are being p...
The objective of this work is to investigate the algorithm design and the programming model of mult...
Multi-threaded workloads typically show sublinear speedup on multi-core hardware, i.e., the achieved...
Performance analysis tools are essential to the maintenance of efficient parallel execution of scien...
Multi-core processors have become the dominant processor architecture with 2, 4, and 8 cores on a ch...
peer-reviewedThe shift towards multicore processing has led to a much wider population of developer...
In multicores, performance-critical synchronization is increasingly performed in a lock-free manner ...
Faced with nearly stagnant clock speed advances, chip manufacturers have turned to parallelism as th...
This paper reviews some important issues for scalability\ud in programming and future trend with man...
International audience—Estimating the potential performance of parallel applications on the yet-to-b...
AbstractA number of highly-threaded, many-core architectures hide memory-access latency by low-overh...
Over the past decade, multicore machines have become the norm. A single machine is capable of having...
International audienceEstimating the potential performance of parallel applicationson the yet-to-be-...
<p>Heterogeneous processors with accelerators provide an opportunity to improve performance within a...
Supercomputers are used to solve some of the world’s most computationally demanding problems. Exasc...