There has been little work investigating the overall perfor-mance impact of on-chip communication in manycore com-pute accelerators. In this paper we evaluate performance of a GPU-like compute accelerator running CUDA work-loads and consisting of compute nodes, interconnection net-work and the graphics DRAM memory system using de-tailed cycle-level simulation. First, we study performance of a baseline architecture employing a scalable mesh network. We then propose several microarchitectural techniques to exploit the communication characteristics of these applica-tions while providing a cost-effective (i.e., low area) on-chip network. Instead of increasing costly bisection bandwidth, we increase the the number of injection ports at the mem-o...
Many practical data-processing algorithms fail to execute efficiently on general-purpose CPUs (Centr...
As multi-core systems begin to appear, their possible applications, parallel performance and on-chip...
As the number of cores per die increases, be they pro-cessors, memory blocks, or custom accelerators...
Abstract—As the number of cores and threads in manycore compute accelerators such as Graphics Proces...
Abstract—As the number of cores and threads in manycore compute accelerators such as Graphics Proces...
Physical limits of power usage for integrated circuits have steered the microprocessor industry towa...
On-chip communication infrastructure is a central component of modern systems-on-chip (SoCs), and it...
As the number of cores per die increases, be they processors, memory blocks, or custom accelerators,...
2014-07-02Many‐core processors will continue to proliferate in the next decade across the entire com...
Power density constraints and processor reliability concerns are causing energy efficient processor ...
Multi-core processors have rapidly grown in core count since the first commercial dual-core processo...
We describe a System-C based framework we are developing, to explore the impact of various architect...
To achieve high throughput, core count in compute accelerators such as General-Purpose Graphics Proc...
Modern processors have included hardware accelerators to provide high computation capability and low...
Many of the issues that will be faced by the designers of multi-billion transistor chips may be alle...
Many practical data-processing algorithms fail to execute efficiently on general-purpose CPUs (Centr...
As multi-core systems begin to appear, their possible applications, parallel performance and on-chip...
As the number of cores per die increases, be they pro-cessors, memory blocks, or custom accelerators...
Abstract—As the number of cores and threads in manycore compute accelerators such as Graphics Proces...
Abstract—As the number of cores and threads in manycore compute accelerators such as Graphics Proces...
Physical limits of power usage for integrated circuits have steered the microprocessor industry towa...
On-chip communication infrastructure is a central component of modern systems-on-chip (SoCs), and it...
As the number of cores per die increases, be they processors, memory blocks, or custom accelerators,...
2014-07-02Many‐core processors will continue to proliferate in the next decade across the entire com...
Power density constraints and processor reliability concerns are causing energy efficient processor ...
Multi-core processors have rapidly grown in core count since the first commercial dual-core processo...
We describe a System-C based framework we are developing, to explore the impact of various architect...
To achieve high throughput, core count in compute accelerators such as General-Purpose Graphics Proc...
Modern processors have included hardware accelerators to provide high computation capability and low...
Many of the issues that will be faced by the designers of multi-billion transistor chips may be alle...
Many practical data-processing algorithms fail to execute efficiently on general-purpose CPUs (Centr...
As multi-core systems begin to appear, their possible applications, parallel performance and on-chip...
As the number of cores per die increases, be they pro-cessors, memory blocks, or custom accelerators...