International audience! Abstract Concurrent access to bank-interleaved memory structure have been studied for decades, particularly in the context of vector supercomputer systems. It is still common belief that using a number of banks different from 2 n leads to insert a complex hardware including a non-trivial divider on the access path to the memory. In 1993, two independent studies [1], [2] were showing that through leveraging a very simple arithmetic result, the Chinese Remainder Theorem, this euclidean division is not needed when the number of banks is prime or simply odd. In the mid 90's, the interest for vector supercomputers faded and the research topic disappeared. The interest for bank-interleaved cache has reappeared recently [3]...
We develop a cache-oblivious data structure for storing a set S of N axis-aligned rectangles in the ...
In this paper we demonstrate the practical portability of a simple version of matrix multiplication ...
We propose a novel kernel-level memory allocator, called M3 (Mcube, Multi-core Multi-bank Memory all...
International audience! Abstract Concurrent access to bank-interleaved memory structure have been st...
Using a prime number N of memory banks on a vector processor allows a conflict-free access for any s...
Vector supercomputers, which can process large amounts of vector data efficiently, are among the fas...
Abstract—Modern high performance processors require memory systems that can provide access to data a...
This paper presents an experimental study on cache memory designs for vector computers. We use an ex...
In the past, vector supercomputers achieved high performance with long arithmetic pipelines coupled ...
On many commercial supercomputers, several vector register processors share a global highly interlea...
This paper introduces an innovative cache design for vector computers, called prime-mapped cache. By...
This paper introduces an innovative cache design for vector computers, called prime-mapped cache. By...
Interleaved memories are often used to provide the high bandwidth needed by multiprocessors and high...
Database systems access memory either sequentially or randomly. Contrary to sequential access and de...
High performance architectures depend heavily on efficient multi-level memory hierarchies to minimiz...
We develop a cache-oblivious data structure for storing a set S of N axis-aligned rectangles in the ...
In this paper we demonstrate the practical portability of a simple version of matrix multiplication ...
We propose a novel kernel-level memory allocator, called M3 (Mcube, Multi-core Multi-bank Memory all...
International audience! Abstract Concurrent access to bank-interleaved memory structure have been st...
Using a prime number N of memory banks on a vector processor allows a conflict-free access for any s...
Vector supercomputers, which can process large amounts of vector data efficiently, are among the fas...
Abstract—Modern high performance processors require memory systems that can provide access to data a...
This paper presents an experimental study on cache memory designs for vector computers. We use an ex...
In the past, vector supercomputers achieved high performance with long arithmetic pipelines coupled ...
On many commercial supercomputers, several vector register processors share a global highly interlea...
This paper introduces an innovative cache design for vector computers, called prime-mapped cache. By...
This paper introduces an innovative cache design for vector computers, called prime-mapped cache. By...
Interleaved memories are often used to provide the high bandwidth needed by multiprocessors and high...
Database systems access memory either sequentially or randomly. Contrary to sequential access and de...
High performance architectures depend heavily on efficient multi-level memory hierarchies to minimiz...
We develop a cache-oblivious data structure for storing a set S of N axis-aligned rectangles in the ...
In this paper we demonstrate the practical portability of a simple version of matrix multiplication ...
We propose a novel kernel-level memory allocator, called M3 (Mcube, Multi-core Multi-bank Memory all...