Starting from a Data-Flow execution model called “DF-Threads”, we defined a minimalistic API to enable an efficient implementation in the hardware of the distribution of the threads across the cores of a single multi-core system and across the remote cores of a cluster. We aim at proposing this API as a simple programming model in C language that can potentially permit an easy interface between DF-Threads and generic programming models. Clusters are typically programmed with MPI, therefore we evaluated our approach against OpenMPI. If we consider the delivered GFLOPS per core, DF-Threads are also competitive in respect to CUDA. In the basic examples, that we used in this initial investigation, DF-Threads achieve better performance-per-core ...
International audienceTo amortize the cost of MPI collective operations, non-blocking collectives ha...
Holistic tuning and optimization of hybrid MPI and OpenMP applications is becoming focus for paralle...
Traditionally, the compilation of dataparallel languages is targeted to low-level runtime environmen...
Starting from a Data-Flow execution model called “DF-Threads”, we defined a minimalistic API to enab...
Current computing systems are mostly focused on achieving performance, programmability, energy effic...
Data-Flow Threads (DF-Threads) is a new execution model that permits to seamlessly distribute the wo...
This thesis introduces the data-triggered threads (DTT) programming and execution model. Unlike thre...
Future exascale machines will require multi/many-core architectures able to energyciently run multi-...
Abstract—With the increasing prominence of many-core archi-tectures and decreasing per-core resource...
Many-core architectures, such as the Intel Xeon Phi, provide dozens of cores and hundreds of hardwar...
In this paper, we show the potential benefits of translating OpenMP code to low-level parallel code ...
The trend to develop increasingly more intelligent systems leads directly to a considerable demand f...
The data-triggered threads (DTT) programming and execution model can increase parallelism and elimin...
We present a completely new kind of approach for mapping the computation of an application to MP-SOC...
Threads provide a useful programming model for asynchronous behavior because of their ability to enc...
International audienceTo amortize the cost of MPI collective operations, non-blocking collectives ha...
Holistic tuning and optimization of hybrid MPI and OpenMP applications is becoming focus for paralle...
Traditionally, the compilation of dataparallel languages is targeted to low-level runtime environmen...
Starting from a Data-Flow execution model called “DF-Threads”, we defined a minimalistic API to enab...
Current computing systems are mostly focused on achieving performance, programmability, energy effic...
Data-Flow Threads (DF-Threads) is a new execution model that permits to seamlessly distribute the wo...
This thesis introduces the data-triggered threads (DTT) programming and execution model. Unlike thre...
Future exascale machines will require multi/many-core architectures able to energyciently run multi-...
Abstract—With the increasing prominence of many-core archi-tectures and decreasing per-core resource...
Many-core architectures, such as the Intel Xeon Phi, provide dozens of cores and hundreds of hardwar...
In this paper, we show the potential benefits of translating OpenMP code to low-level parallel code ...
The trend to develop increasingly more intelligent systems leads directly to a considerable demand f...
The data-triggered threads (DTT) programming and execution model can increase parallelism and elimin...
We present a completely new kind of approach for mapping the computation of an application to MP-SOC...
Threads provide a useful programming model for asynchronous behavior because of their ability to enc...
International audienceTo amortize the cost of MPI collective operations, non-blocking collectives ha...
Holistic tuning and optimization of hybrid MPI and OpenMP applications is becoming focus for paralle...
Traditionally, the compilation of dataparallel languages is targeted to low-level runtime environmen...