In heterogeneous computer architectures, the serial part of an application is coupled with domain-specific accelerators that promise high computing throughput and efficiency across a wide range of applications. In such systems, the serial part of a program is executed on a Central Processing Unit (CPU) core optimized for single-thread performance, while parallel sections are offloaded to Programmable Manycore Accelerators (PMCAs). This heterogeneity requires CPU cores and PMCAs to share data in memory efficiently, although CPUs rely on a coherent memory system where data is transferred in cache lines, while PMCAs are based on non-coherent scratchpad memories where data is transferred in bursts by DMA engines. In this paper, we tackle the ch...
A widely adopted design paradigm for many-core accelerators features processing elements grouped in ...
Increasing demand for power-efficient, high-performance computing has spurred a growing number and d...
Abstract — One of the key challenges in advanced micro-architecture is to provide high performance h...
In heterogeneous computer architectures, the serial part of an application is coupled with domain-sp...
Modern embedded systems on chip (SoCs) are heavily based on heterogeneous architectures that combine...
Heterogeneous parallel computing combines general purpose processors with accelerators to efficientl...
Heterogeneous systems on chip (HeSoCs) combine general-purpose, feature-rich multi-core host process...
Field-Programmable Gate Arrays (FPGAs) systems now comprise many processing elements that are proce...
In a previous PPoPP paper we showed how the FLAME method-ology, combined with the SuperMatrix runtim...
New generation System-on-Chips will be extremely complex devices, composed from complex subsystems, ...
New architectures for extreme-scale computing need to be designed for higher energy efficiency than ...
We present design details and some initial performance results of a novel scalable shared memory mul...
The performance gap between CPUs, and memory memory has diverged significantly since the 1980's maki...
This work describes a cache architecture and memory model for 1000+ core microprocessors. Our appro...
Many algorithms and applications in scientific computing exhibit irregular access patterns as consec...
A widely adopted design paradigm for many-core accelerators features processing elements grouped in ...
Increasing demand for power-efficient, high-performance computing has spurred a growing number and d...
Abstract — One of the key challenges in advanced micro-architecture is to provide high performance h...
In heterogeneous computer architectures, the serial part of an application is coupled with domain-sp...
Modern embedded systems on chip (SoCs) are heavily based on heterogeneous architectures that combine...
Heterogeneous parallel computing combines general purpose processors with accelerators to efficientl...
Heterogeneous systems on chip (HeSoCs) combine general-purpose, feature-rich multi-core host process...
Field-Programmable Gate Arrays (FPGAs) systems now comprise many processing elements that are proce...
In a previous PPoPP paper we showed how the FLAME method-ology, combined with the SuperMatrix runtim...
New generation System-on-Chips will be extremely complex devices, composed from complex subsystems, ...
New architectures for extreme-scale computing need to be designed for higher energy efficiency than ...
We present design details and some initial performance results of a novel scalable shared memory mul...
The performance gap between CPUs, and memory memory has diverged significantly since the 1980's maki...
This work describes a cache architecture and memory model for 1000+ core microprocessors. Our appro...
Many algorithms and applications in scientific computing exhibit irregular access patterns as consec...
A widely adopted design paradigm for many-core accelerators features processing elements grouped in ...
Increasing demand for power-efficient, high-performance computing has spurred a growing number and d...
Abstract — One of the key challenges in advanced micro-architecture is to provide high performance h...