Recent advances in shared memory multiprocessor system-on-a-chip (MP-SOC) architectures include using special step caches to efficiently implement concurrent read concurrent write memory access. Unfortunately the existing step cache techniques do not support multioperations that can be used to speed up execution of a number of parallel algorithms by a logarithmic factor. In this paper we propose an architectural technique for implementing multioperations on step cached MP-SOCs even if the associativity of caches is limited. The technique is based on simple active memory units, faster memory modules, and small processor-level memory blocks called scratchpads. We evaluate the performance and area requirements of the proposed technique on our ...
Due to VLSI lithography problems and the limitation of additional architectural enhancements uniproc...
Moving threads is a theoretically interesting approach for mapping the computation of an application...
Multioperations are primitives of parallel computation for which processors perform a reduction, e.g...
Recent advances in shared memory multiprocessor system-on-a-chip (MP-SOC) architectures include usin...
In this paper we introduce a novel class of caches, named step caches, that can be used to implement...
Step caches are caches in which data entered to an cache array is kept valid only until the end of o...
Niemann J-C, Liß C, Porrmann M, Rückert U. A Multiprocessor Cache for Massively Parallel SoC Archite...
The performance of a computing system heavily depends on the memory hierarchy. Fast but expensive ca...
We introduce an architectural approach to improve memory system performance in both uniprocessor and...
We introduce an architectural approach to improve memory system performance in both uniprocessor and...
As the number of cores increases in both incoming and future shared-memory chip--multiprocessor (CMP...
[[abstract]]We propose efficient stack simulation algorithms for shared memory multiprocessor (MP) c...
On the road to computer systems able to support the requirements of exascale applications, Chip Mult...
Manufacturers are focusing on multiprocessor-system-on-a-chip (MPSoC) architectures in order to prov...
We present a completely new kind of approach for mapping the computation of an application to MP-SOC...
Due to VLSI lithography problems and the limitation of additional architectural enhancements uniproc...
Moving threads is a theoretically interesting approach for mapping the computation of an application...
Multioperations are primitives of parallel computation for which processors perform a reduction, e.g...
Recent advances in shared memory multiprocessor system-on-a-chip (MP-SOC) architectures include usin...
In this paper we introduce a novel class of caches, named step caches, that can be used to implement...
Step caches are caches in which data entered to an cache array is kept valid only until the end of o...
Niemann J-C, Liß C, Porrmann M, Rückert U. A Multiprocessor Cache for Massively Parallel SoC Archite...
The performance of a computing system heavily depends on the memory hierarchy. Fast but expensive ca...
We introduce an architectural approach to improve memory system performance in both uniprocessor and...
We introduce an architectural approach to improve memory system performance in both uniprocessor and...
As the number of cores increases in both incoming and future shared-memory chip--multiprocessor (CMP...
[[abstract]]We propose efficient stack simulation algorithms for shared memory multiprocessor (MP) c...
On the road to computer systems able to support the requirements of exascale applications, Chip Mult...
Manufacturers are focusing on multiprocessor-system-on-a-chip (MPSoC) architectures in order to prov...
We present a completely new kind of approach for mapping the computation of an application to MP-SOC...
Due to VLSI lithography problems and the limitation of additional architectural enhancements uniproc...
Moving threads is a theoretically interesting approach for mapping the computation of an application...
Multioperations are primitives of parallel computation for which processors perform a reduction, e.g...