A key challenge in scaling shared-L1 multi-core clusters towards many-core (more than 16 cores) configurations is to ensure low-latency and efficient access to the L1 memory. In this work we demonstrate that it is possible to scale up the shared-L1 architecture: We present MemPool, a 32 bit many-core system with 256 fast RV32IMA 'Snitch' cores featuring application-tunable execution units, running at 700 MHz in typical conditions (TT/0.80 V/25 °C). MemPool is easy to program, with all the cores sharing a global view of a large L1 scratchpad memory pool, accessible within at most 5 cycles. In MemPool's physical-aware design, we emphasized the exploration, design, and optimization of the low-latency processor-to-L1-memory interconnect. We com...
open6siThe steeply growing performance demands for highly power- and energy-constrained processing s...
A reliable and variation-tolerant architecture for shared-L1 processor clusters is proposed. The arc...
Improving the performance of future computing systems will be based upon the ability of increasing t...
A key challenge in scaling shared-L1 multi-core clusters towards many-core (more than 16 cores) conf...
The evolution of multi- and many-core platforms is rapidly increasing the available on-chip computat...
A shared-L1 cache architecture is proposed for tightly coupled processor clusters. Sharing an L1 tig...
L1 instruction caches in many-core systems represent a siz-able fraction of the total power consumpt...
none3noIn this brief, we propose a variation-tolerant architecture for shared-L1 processor clusters ...
International audiencePower-efficient architectures have become the most important feature required ...
High performance and extreme energy efficiency are strong requirements for a fast-growing number of ...
none4Shared L1 memory is an interesting architectural option for building tightly-coupled multi-core...
none2noOn-chip L2 cache architectures, well established in high-performance parallel computing syste...
High Energy efficiency and high performance are the key regiments for Internet of Things (IoT) edge ...
The design of the memory hierarchy in a multi-core architecture is a critical component since it mus...
none5Shared L1 memories are of interest for tightly-coupled processor clusters in programmable accel...
open6siThe steeply growing performance demands for highly power- and energy-constrained processing s...
A reliable and variation-tolerant architecture for shared-L1 processor clusters is proposed. The arc...
Improving the performance of future computing systems will be based upon the ability of increasing t...
A key challenge in scaling shared-L1 multi-core clusters towards many-core (more than 16 cores) conf...
The evolution of multi- and many-core platforms is rapidly increasing the available on-chip computat...
A shared-L1 cache architecture is proposed for tightly coupled processor clusters. Sharing an L1 tig...
L1 instruction caches in many-core systems represent a siz-able fraction of the total power consumpt...
none3noIn this brief, we propose a variation-tolerant architecture for shared-L1 processor clusters ...
International audiencePower-efficient architectures have become the most important feature required ...
High performance and extreme energy efficiency are strong requirements for a fast-growing number of ...
none4Shared L1 memory is an interesting architectural option for building tightly-coupled multi-core...
none2noOn-chip L2 cache architectures, well established in high-performance parallel computing syste...
High Energy efficiency and high performance are the key regiments for Internet of Things (IoT) edge ...
The design of the memory hierarchy in a multi-core architecture is a critical component since it mus...
none5Shared L1 memories are of interest for tightly-coupled processor clusters in programmable accel...
open6siThe steeply growing performance demands for highly power- and energy-constrained processing s...
A reliable and variation-tolerant architecture for shared-L1 processor clusters is proposed. The arc...
Improving the performance of future computing systems will be based upon the ability of increasing t...