On many commercial supercomputers, several vector register processors share a global highly interleaved memory in a MIMD mode. When all the processors are working on a single verctor loop, a significant part of the potential memory throughput may be wasted due to the asynchronism of the processors. In order to limit loss of memory throughput, a SIMD synchronization mode for vector accesses to memory may be used. But an important part of the memory bandwith may be wasted when accessing vectors with an even stride. In this paper, we present IPS, an interleaved parallel scheme, which ensures an equitable distribution of elements on a highly interleaved memory for a wide range a vector strides. We show how to organize access to memory, such tha...
As the rate of annual data generation grows exponentially, there is a demand to aggregate and summar...
This paper presents data confirming the fact that traditional vector architectures can not reduce th...
We present a taxonomy and modular implementation approach for data-parallel accelerators, including ...
On many commercial supercomputers, several vector register processors share a global highly interlea...
IRISA - Publication interne no 646, 14 p., mars 1992SIGLEAvailable at INIST (FR), Document Supply Se...
Vector supercomputers, which can process large amounts of vector data efficiently, are among the fas...
The poor bandwidth obtained from memory when conflicts arise in the modules or in the interconnectio...
Interleaved memories are often used to provide the high bandwidth needed by multiprocessors and high...
Proceedings of the 1993 IEEE Region 10 Conference on Computer, Communication, Control and Power Engi...
The synchronized and simultaneous access to several vectors that form a single stream occurs in SIMD...
International audienceFor high throughput applications, turbo-like iterative decoders are implemente...
This paper presents mathematical foundations for the design of a memory controller subcomponent that...
The purpose of this paper is to show that multi-threading techniques can be applied to a vector proc...
Many applications use vector operations by applying single instruction to multiple data that map to ...
4 pagesInternational audienceFor high throughput applications, turbo-like iterative decoders are imp...
As the rate of annual data generation grows exponentially, there is a demand to aggregate and summar...
This paper presents data confirming the fact that traditional vector architectures can not reduce th...
We present a taxonomy and modular implementation approach for data-parallel accelerators, including ...
On many commercial supercomputers, several vector register processors share a global highly interlea...
IRISA - Publication interne no 646, 14 p., mars 1992SIGLEAvailable at INIST (FR), Document Supply Se...
Vector supercomputers, which can process large amounts of vector data efficiently, are among the fas...
The poor bandwidth obtained from memory when conflicts arise in the modules or in the interconnectio...
Interleaved memories are often used to provide the high bandwidth needed by multiprocessors and high...
Proceedings of the 1993 IEEE Region 10 Conference on Computer, Communication, Control and Power Engi...
The synchronized and simultaneous access to several vectors that form a single stream occurs in SIMD...
International audienceFor high throughput applications, turbo-like iterative decoders are implemente...
This paper presents mathematical foundations for the design of a memory controller subcomponent that...
The purpose of this paper is to show that multi-threading techniques can be applied to a vector proc...
Many applications use vector operations by applying single instruction to multiple data that map to ...
4 pagesInternational audienceFor high throughput applications, turbo-like iterative decoders are imp...
As the rate of annual data generation grows exponentially, there is a demand to aggregate and summar...
This paper presents data confirming the fact that traditional vector architectures can not reduce th...
We present a taxonomy and modular implementation approach for data-parallel accelerators, including ...