While parallel architectures based on clusters of Processing Elements (PEs) sharing L1 memory are widespread, there is no consensus on how lean their PE should be. Architecting PEs as vector processors holds the promise to greatly reduce their instruction fetch bandwidth, mitigating the Von Neumann Bottleneck (VNB). However, due to their historical association with supercomputers, classical vector machines include micro-architectural tricks to improve the Instruction Level Parallelism (ILP), which increases their instruction fetch and decode energy overhead. In this paper, we explore for the first time vector processing as an option to build small and efficient PEs for large-scale shared-L1 clusters. We propose Spatz, a compact, modular 32-...
In the last 15 years, power dissipation and energy consumption have become crucial design concerns f...
Moore’s Law predicted that the number of transistors on a chip would double approximately every 2 ye...
With processor clock speeds having stagnated, parallel computing architectures have achieved a break...
In this paper, we present Ara, a 64-bit vector processor based on the version 0.5 draft of RISC-V's ...
For years, SIMD/vector units have enhanced the capabilities of modern CPUs in High-Performance Compu...
In the low-end mobile processor market, power, energy, and area budgets are significantly lower than...
open6siThe steeply growing performance demands for highly power- and energy-constrained processing s...
Modern high-performance computing architectures (Multicore, GPU, Manycore) are based on tightly-coup...
Modern scientific applications are getting more diverse, and the vector lengths in those application...
In the low-end mobile processor market, power, energy and area budgets are significantly lower than ...
Vector Processors (VPs) created the breakthroughs needed for the emergence of computational science ...
Power constraints led to the end of exponential growth in single–processor performance, which charac...
none4siA key challenge in scaling shared-L1 multi-core clusters towards many-core (more than 16 core...
This paper presents data confirming the fact that traditional vector architectures can not reduce th...
Vector processing has become commonplace in today's CPU microarchitectures. Vector instructions impr...
In the last 15 years, power dissipation and energy consumption have become crucial design concerns f...
Moore’s Law predicted that the number of transistors on a chip would double approximately every 2 ye...
With processor clock speeds having stagnated, parallel computing architectures have achieved a break...
In this paper, we present Ara, a 64-bit vector processor based on the version 0.5 draft of RISC-V's ...
For years, SIMD/vector units have enhanced the capabilities of modern CPUs in High-Performance Compu...
In the low-end mobile processor market, power, energy, and area budgets are significantly lower than...
open6siThe steeply growing performance demands for highly power- and energy-constrained processing s...
Modern high-performance computing architectures (Multicore, GPU, Manycore) are based on tightly-coup...
Modern scientific applications are getting more diverse, and the vector lengths in those application...
In the low-end mobile processor market, power, energy and area budgets are significantly lower than ...
Vector Processors (VPs) created the breakthroughs needed for the emergence of computational science ...
Power constraints led to the end of exponential growth in single–processor performance, which charac...
none4siA key challenge in scaling shared-L1 multi-core clusters towards many-core (more than 16 core...
This paper presents data confirming the fact that traditional vector architectures can not reduce th...
Vector processing has become commonplace in today's CPU microarchitectures. Vector instructions impr...
In the last 15 years, power dissipation and energy consumption have become crucial design concerns f...
Moore’s Law predicted that the number of transistors on a chip would double approximately every 2 ye...
With processor clock speeds having stagnated, parallel computing architectures have achieved a break...