Most high-performance processors today are able to execute multiple threads of execution simultaneously. Threads share processor resources, like the last-level cache, which may decrease throughput in a non obvious way, depending on threads characteristics. Computer architects usually study multiprogrammed workloads by considering a set of benchmarks and some combinations of these benchmarks. Because cycle-accurate microarchitecture simulators are slow, we want a set of combinations that is as small as possible, yet representative. However, there is no standard method for selecting such sample, and different authors have used different methods. It is not clear how the choice of a particular sample impacts the conclusions of a study. We propo...
Nowadays, many scientific applications need to be parallelized. This parallelization allows to compl...
Current supercomputer architectures are subject to memory related issues. For instance we can observ...
In modern High Performance Computing architectures, the memory subsystem is a common performance ...
International audienceMost high-performance processors today are able to execute multiple threads of...
Most high-performance processors today are able to execute multiple threads of execution simultaneou...
Since several years, classical multiprocessor systems have evolved to multicores, which tightly inte...
The complexity of CPUs has increased considerably since their beginnings, introducing mechanisms suc...
This paper investigates the execution of tree-shaped task graphs using multiple processors. Each edg...
Recent advances in processor technology have lead to affordable multi-core processors, which could e...
International audienceThis report presents a study of techniques used to speedup a scientific simula...
Cloud computing promises the delivery of on-demand pay-per-use access to unlimited resources. Using ...
Finite difference methods are, in general, well suited to execution on parallel machines and are thu...
In this report, we study the problem of optimizing the throughput of applications for heterogeneous ...
Task-based models and runtimes are quite popular in the HPC community. Theyhelp to implement applica...
Load balancing is an important step conditioning the performance of parallel programs. If the worklo...
Nowadays, many scientific applications need to be parallelized. This parallelization allows to compl...
Current supercomputer architectures are subject to memory related issues. For instance we can observ...
In modern High Performance Computing architectures, the memory subsystem is a common performance ...
International audienceMost high-performance processors today are able to execute multiple threads of...
Most high-performance processors today are able to execute multiple threads of execution simultaneou...
Since several years, classical multiprocessor systems have evolved to multicores, which tightly inte...
The complexity of CPUs has increased considerably since their beginnings, introducing mechanisms suc...
This paper investigates the execution of tree-shaped task graphs using multiple processors. Each edg...
Recent advances in processor technology have lead to affordable multi-core processors, which could e...
International audienceThis report presents a study of techniques used to speedup a scientific simula...
Cloud computing promises the delivery of on-demand pay-per-use access to unlimited resources. Using ...
Finite difference methods are, in general, well suited to execution on parallel machines and are thu...
In this report, we study the problem of optimizing the throughput of applications for heterogeneous ...
Task-based models and runtimes are quite popular in the HPC community. Theyhelp to implement applica...
Load balancing is an important step conditioning the performance of parallel programs. If the worklo...
Nowadays, many scientific applications need to be parallelized. This parallelization allows to compl...
Current supercomputer architectures are subject to memory related issues. For instance we can observ...
In modern High Performance Computing architectures, the memory subsystem is a common performance ...