Long memory latencies are mitigated through the use of large cache hierarchies in multi-core architectures, SIMD execution in GPU architectures and streaming of data in FPGA-based accelerators. However, none of these approaches benefits irregular applications that exhibit no locality and rely on extensive pointer de-referencing for data accesses. By masking the memory latency, multi-threaded execution has been demonstrated to deal effectively with such applications. In the MT-FPGA model a multi-threaded engine is implemented on the FPGA accelerator specifically for the masking on the memory latency in the execution of irregular applications: following a memory access, the execution is switched to a ready thread while the suspended threads w...
Inexpensive DRAMs have created new opportunities for in-memory data analytics. However, the major bo...
General Purpose Graphical Processing Units (GPGPUs) rose to prominence with the release of the Fermi...
As we observe diminishing returns for multi-core CPUs, especially when considering power budgets, FP...
The performance gap between CPUs, and memory memory has diverged significantly since the 1980's maki...
The last two decade has witnessed two opposing hardware trends where the DRAM capacity and the acces...
Algorithms that exhibit irregular memory access patterns are known to show poor performance on multi...
Multithreading is a well-known technique for general-purpose systems to deliver a substantial perfor...
Sparse matrix-vector multiplication (SpMV) is a key operation in scientific computing and engineerin...
The purpose of this paper is to show that multi-threading techniques can be applied to a vector proc...
Sparse matrix-vector multiplication (SMVM) is a fundamental operation in many scientific and enginee...
The Gustavson’s algorithm (i.e., the row-wise product algorithm) shows its potential as the backbone...
One of the key kernels in scientific applications is the Sparse Matrix Vector Multiplication (SMVM)....
Managing the memory wall is critical for massively par-allel FPGA applications where data-sets are l...
The performance improvement of conventional processor has begun to stagnate in recent years. Because...
Modern commodity processors such as GPUs may execute up to about a thousand of physical threads per ...
Inexpensive DRAMs have created new opportunities for in-memory data analytics. However, the major bo...
General Purpose Graphical Processing Units (GPGPUs) rose to prominence with the release of the Fermi...
As we observe diminishing returns for multi-core CPUs, especially when considering power budgets, FP...
The performance gap between CPUs, and memory memory has diverged significantly since the 1980's maki...
The last two decade has witnessed two opposing hardware trends where the DRAM capacity and the acces...
Algorithms that exhibit irregular memory access patterns are known to show poor performance on multi...
Multithreading is a well-known technique for general-purpose systems to deliver a substantial perfor...
Sparse matrix-vector multiplication (SpMV) is a key operation in scientific computing and engineerin...
The purpose of this paper is to show that multi-threading techniques can be applied to a vector proc...
Sparse matrix-vector multiplication (SMVM) is a fundamental operation in many scientific and enginee...
The Gustavson’s algorithm (i.e., the row-wise product algorithm) shows its potential as the backbone...
One of the key kernels in scientific applications is the Sparse Matrix Vector Multiplication (SMVM)....
Managing the memory wall is critical for massively par-allel FPGA applications where data-sets are l...
The performance improvement of conventional processor has begun to stagnate in recent years. Because...
Modern commodity processors such as GPUs may execute up to about a thousand of physical threads per ...
Inexpensive DRAMs have created new opportunities for in-memory data analytics. However, the major bo...
General Purpose Graphical Processing Units (GPGPUs) rose to prominence with the release of the Fermi...
As we observe diminishing returns for multi-core CPUs, especially when considering power budgets, FP...