The disparity between microprocessor clock frequencies and memory latency is a primary reason why many demanding applications run well below peak achievable performance. Software controlled scratchpad memories, such as the Cell local store, attempt to ameliorate this discrepancy by enabling precise control over memory movement; however, scratchpad technology confronts the programmer and compiler with an unfamiliar and difficult programming model. In this work, we present the Virtual Vector Architecture (ViVA), which combines the memory semantics of vector computers with a software-controlled scratchpad memory in order to provide a more effective and practical approach to latency hiding. ViVA requires minimal changes to the core design and c...
While programmable accelerators such as application-specific processors and reconfigurable architect...
Computer engineering is advancing rapidly. For 55 years, the performance of integrated circuits has ...
Managing the memory wall is critical for massively par-allel FPGA applications where data-sets are l...
Previous work has demonstrated soft-core vector processors in FPGAs can be applied to speed up data-...
Virtual memory is a classic computer science abstraction and is ubiquitous in all scales of computin...
In this work, we propose a Programmable Vector Memory Controller (PVMC), which boosts noncontiguous ...
We are attacking the memory bottleneck by building a “smart ” memory controller that improves effect...
We present a taxonomy and modular implementation approach for data-parallel accelerators, including ...
English: Power consumption has become one of the dominant issues in processor design, especially imp...
This paper presents an experimental study on cache memory designs for vector computers. We use an ex...
To manage power and memory wall affects, the HPC industry supports FPGA reconfigurable accelerators ...
The purpose of this paper is to show that using decoupling techniques in a vector processor, the per...
Vector architectures have been traditionally applied to the supercomputing domain with many successf...
In the low-end mobile processor market, power, energy, and area budgets are significantly lower than...
Sparsematrix operations are critical kernels inmultiple application domains such as High Performanc...
While programmable accelerators such as application-specific processors and reconfigurable architect...
Computer engineering is advancing rapidly. For 55 years, the performance of integrated circuits has ...
Managing the memory wall is critical for massively par-allel FPGA applications where data-sets are l...
Previous work has demonstrated soft-core vector processors in FPGAs can be applied to speed up data-...
Virtual memory is a classic computer science abstraction and is ubiquitous in all scales of computin...
In this work, we propose a Programmable Vector Memory Controller (PVMC), which boosts noncontiguous ...
We are attacking the memory bottleneck by building a “smart ” memory controller that improves effect...
We present a taxonomy and modular implementation approach for data-parallel accelerators, including ...
English: Power consumption has become one of the dominant issues in processor design, especially imp...
This paper presents an experimental study on cache memory designs for vector computers. We use an ex...
To manage power and memory wall affects, the HPC industry supports FPGA reconfigurable accelerators ...
The purpose of this paper is to show that using decoupling techniques in a vector processor, the per...
Vector architectures have been traditionally applied to the supercomputing domain with many successf...
In the low-end mobile processor market, power, energy, and area budgets are significantly lower than...
Sparsematrix operations are critical kernels inmultiple application domains such as High Performanc...
While programmable accelerators such as application-specific processors and reconfigurable architect...
Computer engineering is advancing rapidly. For 55 years, the performance of integrated circuits has ...
Managing the memory wall is critical for massively par-allel FPGA applications where data-sets are l...