We are attacking the memory bottleneck by building a “smart ” memory controller that improves effective mem-ory bandwidth, bus utilization, and cache efficiency by let-ting applications dictate how their data is accessed and cached. This paper describes a Parallel Vector Access unit (PVA), the vector memory subsystem that efficiently “gath-ers ” sparse, strided data structures in parallel on a multi-bank SDRAM memory. We have validated our PVA design via gate-level simulation, and have evaluated its perfor-mance via functional simulation and formal analysis. On unit-stride vectors, PVA performance equals or exceeds that of an SDRAM system optimized for cache line fills. On vec-tors with larger strides, the PVA is up to 32.8 times faster. Ou...
Many high performance applications run well below the peak arithmetic performance of the underlying...
The disparity between microprocessor clock frequencies and memory latency is a primary reason why ma...
As we approach the end of conventional technology scaling, computer architects are forced to incorpo...
This paper presents mathematical foundations for the design of a memory controller subcomponent that...
This paper introduces an innovative cache design for vector computers, called prime-mapped cache. By...
This paper presents an experimental study on cache memory designs for vector computers. We use an ex...
In this work, we propose a Programmable Vector Memory Controller (PVMC), which boosts noncontiguous ...
The focus of this paper is on designing both a low cost and high performance, high bandwidth vector ...
The focus of this paper is on designing both a low cost and high performance, high bandwidth vector ...
This paper introduces an innovative cache design for vector computers, called prime-mapped cache. By...
Many high performance applications run well below the peak arithmetic performance of the underlying ...
Memory system efficiency is crucial for any processor to achieve high performance, especially in the...
International audienceThis article presents Computational SRAM (C-SRAM) solution combining In- and N...
We present a taxonomy and modular implementation approach for data-parallel accelerators, including ...
To manage power and memory wall affects, the HPC industry supports FPGA reconfigurable accelerators ...
Many high performance applications run well below the peak arithmetic performance of the underlying...
The disparity between microprocessor clock frequencies and memory latency is a primary reason why ma...
As we approach the end of conventional technology scaling, computer architects are forced to incorpo...
This paper presents mathematical foundations for the design of a memory controller subcomponent that...
This paper introduces an innovative cache design for vector computers, called prime-mapped cache. By...
This paper presents an experimental study on cache memory designs for vector computers. We use an ex...
In this work, we propose a Programmable Vector Memory Controller (PVMC), which boosts noncontiguous ...
The focus of this paper is on designing both a low cost and high performance, high bandwidth vector ...
The focus of this paper is on designing both a low cost and high performance, high bandwidth vector ...
This paper introduces an innovative cache design for vector computers, called prime-mapped cache. By...
Many high performance applications run well below the peak arithmetic performance of the underlying ...
Memory system efficiency is crucial for any processor to achieve high performance, especially in the...
International audienceThis article presents Computational SRAM (C-SRAM) solution combining In- and N...
We present a taxonomy and modular implementation approach for data-parallel accelerators, including ...
To manage power and memory wall affects, the HPC industry supports FPGA reconfigurable accelerators ...
Many high performance applications run well below the peak arithmetic performance of the underlying...
The disparity between microprocessor clock frequencies and memory latency is a primary reason why ma...
As we approach the end of conventional technology scaling, computer architects are forced to incorpo...