Address transformation schemes, such as skewing and linear transformations, have been proposed to achieve conflict-free vector access for some strides in vector processors with multi-module memories. In this paper, we extend these schemes to achieve this conflict-free access for a larger number of strides. The basic idea is to perform an out-of-order access to vectors of fixed length, equal to that of the vector registers of the processor. Both matched and unmatched memories are considered: we show that the number of strides is even larger for the latter case. The hardware for address calculations and access control is described and shown to be of similar complexity as that required for access in order.Peer Reviewe
Register renaming and out-of-order instruction issue are now commonly used in superscalar processors...
Vector processors often use a cache to exploit temporal locality and reduce memory bandwidth demands...
Memory system efficiency is crucial for any processor to achieve high performance, especially in the...
Address transformation schemes, such as skewing and linear transformations, have been proposed to ac...
Address transformation schemes, such as skewing and linear transformations, have been proposed to ac...
Address transformation schemes, such as skewing and linear transformations, have been proposed to ac...
The poor bandwidth obtained from memory when conflicts arise in the modules or in the interconnectio...
An address mapping and an access order is presented for conflict-free access to vectors with any ini...
When accessing streams in vector multiprocessor machines, degradation in the interconnection network...
The high latency of memory accesses is one of the factors that most contribute to reduce the perform...
Proceedings of the 1993 IEEE Region 10 Conference on Computer, Communication, Control and Power Engi...
The concept of Parallel Vector (scratch pad) Memories (PVM) was introduced as one solution for Paral...
Vector supercomputers, which can process large amounts of vector data efficiently, are among the fas...
The synchronized and simultaneous access to several vectors that form a single stream occurs in SIMD...
The synchronized and simultaneous access to several vectors that form a single stream occurs in SIMD...
Register renaming and out-of-order instruction issue are now commonly used in superscalar processors...
Vector processors often use a cache to exploit temporal locality and reduce memory bandwidth demands...
Memory system efficiency is crucial for any processor to achieve high performance, especially in the...
Address transformation schemes, such as skewing and linear transformations, have been proposed to ac...
Address transformation schemes, such as skewing and linear transformations, have been proposed to ac...
Address transformation schemes, such as skewing and linear transformations, have been proposed to ac...
The poor bandwidth obtained from memory when conflicts arise in the modules or in the interconnectio...
An address mapping and an access order is presented for conflict-free access to vectors with any ini...
When accessing streams in vector multiprocessor machines, degradation in the interconnection network...
The high latency of memory accesses is one of the factors that most contribute to reduce the perform...
Proceedings of the 1993 IEEE Region 10 Conference on Computer, Communication, Control and Power Engi...
The concept of Parallel Vector (scratch pad) Memories (PVM) was introduced as one solution for Paral...
Vector supercomputers, which can process large amounts of vector data efficiently, are among the fas...
The synchronized and simultaneous access to several vectors that form a single stream occurs in SIMD...
The synchronized and simultaneous access to several vectors that form a single stream occurs in SIMD...
Register renaming and out-of-order instruction issue are now commonly used in superscalar processors...
Vector processors often use a cache to exploit temporal locality and reduce memory bandwidth demands...
Memory system efficiency is crucial for any processor to achieve high performance, especially in the...