Heterogeneous SoCs (HeSoCs) typically share a single DRAM between the CPU and GPU, making workloads susceptible to memory interference, and predictable execution troublesome. State-of-the art predictable execution models (PREM) for HeSoCs prefetch data to the GPU scratchpad memory (SPM), for computations to be insensitive to CPU-generated DRAM traffic. However, the amount of work that the small SPM sizes allow is typically insufficient to absorb CPU/GPU synchronization costs. On-chip caches are larger, and would solve this issue, but have been argued too unpredictable due to self-evictions. We show how self-eviction can be minimized in GPU caches via clever managing of prefetches, thus lowering the performance cost, while retaining timing p...
As time predictability is critical to hard real-time systems, it is not only necessary to accurately...
Graphics Processing Units (GPUs) have been shown to be effective at achieving large speedups over co...
Many applications require both high performance and predictable timing. High-performance can be prov...
Heterogeneous SoCs (HeSoCs) typically share a single DRAM between the CPU and GPU, making workloads ...
Heterogeneous systems-on-A-chip are increasingly embracing shared memory designs, in which a single ...
The ever-increasing need for computational power in embedded devices has led to the adoption heterog...
Modern heterogeneous systems-on-chip (HeSoC) feature high-performance multi-core CPUs tightly integr...
Long memory latency and limited throughput become performance bottlenecks of GPGPU applications. The...
none3siThe deployment of real-time workloads on commercial off-the-shelf (COTS) hardware is attracti...
Abstract—With the SIMT execution model, GPUs can hide memory latency through massive multithreading ...
Multi-Processor Systems-on-Chip (MPSoC) platforms will definitely power various future autonomous mac...
Many applications require both high performance and predictable timing. High-performance can be prov...
Modern embedded platforms are known to be constrained by size, weight and power (SWaP) requirements....
As time predictability is critical to hard real-time systems, it is not only necessary to accurately...
Graphics Processing Units (GPUs) have been shown to be effective at achieving large speedups over co...
Many applications require both high performance and predictable timing. High-performance can be prov...
Heterogeneous SoCs (HeSoCs) typically share a single DRAM between the CPU and GPU, making workloads ...
Heterogeneous systems-on-A-chip are increasingly embracing shared memory designs, in which a single ...
The ever-increasing need for computational power in embedded devices has led to the adoption heterog...
Modern heterogeneous systems-on-chip (HeSoC) feature high-performance multi-core CPUs tightly integr...
Long memory latency and limited throughput become performance bottlenecks of GPGPU applications. The...
none3siThe deployment of real-time workloads on commercial off-the-shelf (COTS) hardware is attracti...
Abstract—With the SIMT execution model, GPUs can hide memory latency through massive multithreading ...
Multi-Processor Systems-on-Chip (MPSoC) platforms will definitely power various future autonomous mac...
Many applications require both high performance and predictable timing. High-performance can be prov...
Modern embedded platforms are known to be constrained by size, weight and power (SWaP) requirements....
As time predictability is critical to hard real-time systems, it is not only necessary to accurately...
Graphics Processing Units (GPUs) have been shown to be effective at achieving large speedups over co...
Many applications require both high performance and predictable timing. High-performance can be prov...