Shared L1 memories are of interest for tightly- coupled processor clusters in programmable accelerators as they provide a convenient shared memory abstraction while avoiding cache coherence overheads. The performance of a shared-L1 memory critically depends on the architecture of the low-latency interconnect between processors and memory banks, which needs to provide ultra-fast access to the largest possible L1 working set. The advent of 3D technology provides new opportunities to improve the interconnect delay and the form factor. In this paper we propose a network architecture, 3D-LIN, based on 3D integration technology. The network can be configured based on user specifications and technology constraints to provide fast access to L1 memo...
In this paper, we present a 3D-mesh architecture which is utilized as a processor-memory interconnec...
To reduce traffic jam caused by various data competitions for channel, we present a low delay and en...
Large required size, and tolerance to latency and variations in memory access time make L2 memory a ...
Shared L1 memories are of interest for tightlycoupled processor clusters in programmable accelerator...
International audienceShared L1 memories are of interest for tightly-coupled processor clusters in p...
none5Shared L1 memories are of interest for tightly-coupled processor clusters in programmable accel...
Shared tightly coupled data memories are key architectural elements for building multi-core clusters...
none3noIn this paper we propose two synthesizable 3D network architectures: C-LIN and D-LIN, which a...
The main aim of this thesis is to examine the advantages of 3D stacking applied to microprocessors a...
As Moore’s Law slows down, new integration technologies emerge, such as 3D integration, silicon inte...
L2 memory, serving multiple clusters of tightly coupled processors, is well-suited for 3D integratio...
none4Shared L1 memory is an interesting architectural option for building tightly-coupled multi-core...
The objective of this thesis is to optimize the uncore of 3D many-core architectures. More specifica...
International audienceThanks to their brain-like properties, neural networks outperform traditional ...
The performance of most digital systems today is limited by the interconnect latency between logic a...
In this paper, we present a 3D-mesh architecture which is utilized as a processor-memory interconnec...
To reduce traffic jam caused by various data competitions for channel, we present a low delay and en...
Large required size, and tolerance to latency and variations in memory access time make L2 memory a ...
Shared L1 memories are of interest for tightlycoupled processor clusters in programmable accelerator...
International audienceShared L1 memories are of interest for tightly-coupled processor clusters in p...
none5Shared L1 memories are of interest for tightly-coupled processor clusters in programmable accel...
Shared tightly coupled data memories are key architectural elements for building multi-core clusters...
none3noIn this paper we propose two synthesizable 3D network architectures: C-LIN and D-LIN, which a...
The main aim of this thesis is to examine the advantages of 3D stacking applied to microprocessors a...
As Moore’s Law slows down, new integration technologies emerge, such as 3D integration, silicon inte...
L2 memory, serving multiple clusters of tightly coupled processors, is well-suited for 3D integratio...
none4Shared L1 memory is an interesting architectural option for building tightly-coupled multi-core...
The objective of this thesis is to optimize the uncore of 3D many-core architectures. More specifica...
International audienceThanks to their brain-like properties, neural networks outperform traditional ...
The performance of most digital systems today is limited by the interconnect latency between logic a...
In this paper, we present a 3D-mesh architecture which is utilized as a processor-memory interconnec...
To reduce traffic jam caused by various data competitions for channel, we present a low delay and en...
Large required size, and tolerance to latency and variations in memory access time make L2 memory a ...