The high latency of memory accesses is one of the factors that most contribute to reduce the performance of current vector supercomputers. The conflicts that can occur in the memory modules plus the collisions in the interconnection network in the case of multiprocessors make that the execution time of applications increases significantly. In this work we propose a memory access method that for both cases of vector uniprocessors and multiprocessors allows to perform stream accesses with the smallest possible latency in the majority of the cases. The basic idea is to arbitrate the memory access by defining the order in which the memory modules are visited. The stream elements are requested out of order. In addition, the access method also re...
Presently, the highest performance computer systems are the vector processors which are mainly emplo...
Vector processors often use a cache to exploit temporal locality and reduce memory bandwidth demands...
Memory bandwidth is rapidly becoming the performance bottleneck in the application of high performan...
When accessing streams in vector multiprocessor machines, degradation in the interconnection network...
The performance of a vector processor accessing vectors is strongly dependent on the conflicts produ...
The poor bandwidth obtained from memory when conflicts arise in the modules or in the interconnectio...
Most existing analytical models for memory interference generally assume random bank selection for e...
The synchronized and simultaneous access to several vectors that form a single stream occurs in SIMD...
The synchronized and simultaneous access to several vectors that form a single stream occurs in SIMD...
Address transformation schemes, such as skewing and linear transformations, have been proposed to ac...
Address transformation schemes, such as skewing and linear transformations, have been proposed to ac...
Vector supercomputers, which can process large amounts of vector data efficiently, are among the fas...
The purpose of this paper is to show that multi-threading techniques can be applied to a vector proc...
This paper presents mathematical foundations for the design of a memory controller subcomponent that...
On many commercial supercomputers, several vector register processors share a global highly interlea...
Presently, the highest performance computer systems are the vector processors which are mainly emplo...
Vector processors often use a cache to exploit temporal locality and reduce memory bandwidth demands...
Memory bandwidth is rapidly becoming the performance bottleneck in the application of high performan...
When accessing streams in vector multiprocessor machines, degradation in the interconnection network...
The performance of a vector processor accessing vectors is strongly dependent on the conflicts produ...
The poor bandwidth obtained from memory when conflicts arise in the modules or in the interconnectio...
Most existing analytical models for memory interference generally assume random bank selection for e...
The synchronized and simultaneous access to several vectors that form a single stream occurs in SIMD...
The synchronized and simultaneous access to several vectors that form a single stream occurs in SIMD...
Address transformation schemes, such as skewing and linear transformations, have been proposed to ac...
Address transformation schemes, such as skewing and linear transformations, have been proposed to ac...
Vector supercomputers, which can process large amounts of vector data efficiently, are among the fas...
The purpose of this paper is to show that multi-threading techniques can be applied to a vector proc...
This paper presents mathematical foundations for the design of a memory controller subcomponent that...
On many commercial supercomputers, several vector register processors share a global highly interlea...
Presently, the highest performance computer systems are the vector processors which are mainly emplo...
Vector processors often use a cache to exploit temporal locality and reduce memory bandwidth demands...
Memory bandwidth is rapidly becoming the performance bottleneck in the application of high performan...