As we approach the end of conventional technology scaling, computer architects are forced to incorporate specialized and heterogeneous accelerators into general-purpose processors for greater energy efficiency. Among the prominent accelerators that have recently become more popular are data-parallel processing units, such as classic vector units, SIMD units, and graphics processing units (GPUs). Surveying a wide range of data-parallel architectures and their parallel programming models and compilers reveals an opportunity to construct a new data-parallel machine that is highly performant and efficient, yet a favorable compiler target that maintains the same level of programmability as the others.In this thesis, I present the Hwacha decoup...
In the past decade, accelerators, commonly Graphics Processing Units (GPUs), have played a key role ...
With processor clock speeds having stagnated, parallel computing architectures have achieved a break...
We are attacking the memory bottleneck by building a “smart ” memory controller that improves effect...
We present a taxonomy and modular implementation approach for data-parallel accelerators, including ...
During the past decade, accelerators, such as NVIDIA CUDA GPUs and Intel Xeon Phis, have seen an inc...
Accelerators, such as GPUs and Intel Xeon Phis, have become the workhorses of high-performance compu...
General purpose processors and accelerators including system-on-a-chip and graphics processing units...
The trend of using co-processors as accelerators to perform certain tasks is rising in the parallel...
In heterogeneous computer architectures, the serial part of an application is coupled with domain-sp...
Single-ISA heterogeneous multi-core architectures offer a compelling high-performance and high-effic...
Many high performance applications run well below the peak arithmetic performance of the underlying ...
This paper proposes new processor architecture for accelerating data-parallel applications based on ...
Many high performance applications run well below the peak arithmetic performance of the underlying...
The problem of automatically generating hardware modules from high level application representations...
Summarization: Mapping computational intensive applications on reconfigurable technology for acceler...
In the past decade, accelerators, commonly Graphics Processing Units (GPUs), have played a key role ...
With processor clock speeds having stagnated, parallel computing architectures have achieved a break...
We are attacking the memory bottleneck by building a “smart ” memory controller that improves effect...
We present a taxonomy and modular implementation approach for data-parallel accelerators, including ...
During the past decade, accelerators, such as NVIDIA CUDA GPUs and Intel Xeon Phis, have seen an inc...
Accelerators, such as GPUs and Intel Xeon Phis, have become the workhorses of high-performance compu...
General purpose processors and accelerators including system-on-a-chip and graphics processing units...
The trend of using co-processors as accelerators to perform certain tasks is rising in the parallel...
In heterogeneous computer architectures, the serial part of an application is coupled with domain-sp...
Single-ISA heterogeneous multi-core architectures offer a compelling high-performance and high-effic...
Many high performance applications run well below the peak arithmetic performance of the underlying ...
This paper proposes new processor architecture for accelerating data-parallel applications based on ...
Many high performance applications run well below the peak arithmetic performance of the underlying...
The problem of automatically generating hardware modules from high level application representations...
Summarization: Mapping computational intensive applications on reconfigurable technology for acceler...
In the past decade, accelerators, commonly Graphics Processing Units (GPUs), have played a key role ...
With processor clock speeds having stagnated, parallel computing architectures have achieved a break...
We are attacking the memory bottleneck by building a “smart ” memory controller that improves effect...