As the end of the Moore’s law approaches, more specific devices such as GPUs, FPGAs or AI accelerators tend to steal the workload that was traditionally run on the CPU, allowing with this offload more specific solutions that improve the execution time of specific applications. One of the main problems that arise with this approach, is that now, the data is not centralized in one main memory, but distributed among the different accelerators which need a correct and coherent data to perform its operations. This can potentially limit the performance an accelerator can achieve, as well as delegates the programmer the task of enforcing the coherence between memories. To relieve this model, in which the programmer has to take into ac...
: Virtual memory based cache coherence is a mechanism that relies only on hardware that already exi...
2018-02-23Graphics Processing Units (GPUs) are designed primarily to execute multimedia, and game re...
Abstract—Current commercial solutions intended to provide additional resources to an application bei...
The end of Dennard scaling and Moore's law has motivated a rise in the use of parallelism and hardwa...
The most common model to use co-processors/accelerators is the master-slave model where the slaves ...
We argue that OS-provided data coherence on non-cache-coherent NUMA multiprocessors (machines with a...
Architects have adopted the shared memory model that implicitly manages cache coherence and cache ca...
This work describes a cache architecture and memory model for 1000+ core microprocessors. Our appro...
In programming high performance applications, shared address-space platforms are preferable for fine...
Processors are becoming faster and multiprocessor memory interconnection systems are not keeping up....
Commodity accelerator technologies including reconfigurable devices provide an order of magnitude pe...
AbstractModern parallel programming frameworks like OpenMP often rely on shared memory concepts to h...
On the road to computer systems able to support the requirements of exascale applications, Chip Mult...
ABSTRACT The goal of this project was to improve the performance of large scientific and engineering...
Heterogeneous CPU/FPGA devices, in which a CPU and an FPGA can execute together while sharing memory...
: Virtual memory based cache coherence is a mechanism that relies only on hardware that already exi...
2018-02-23Graphics Processing Units (GPUs) are designed primarily to execute multimedia, and game re...
Abstract—Current commercial solutions intended to provide additional resources to an application bei...
The end of Dennard scaling and Moore's law has motivated a rise in the use of parallelism and hardwa...
The most common model to use co-processors/accelerators is the master-slave model where the slaves ...
We argue that OS-provided data coherence on non-cache-coherent NUMA multiprocessors (machines with a...
Architects have adopted the shared memory model that implicitly manages cache coherence and cache ca...
This work describes a cache architecture and memory model for 1000+ core microprocessors. Our appro...
In programming high performance applications, shared address-space platforms are preferable for fine...
Processors are becoming faster and multiprocessor memory interconnection systems are not keeping up....
Commodity accelerator technologies including reconfigurable devices provide an order of magnitude pe...
AbstractModern parallel programming frameworks like OpenMP often rely on shared memory concepts to h...
On the road to computer systems able to support the requirements of exascale applications, Chip Mult...
ABSTRACT The goal of this project was to improve the performance of large scientific and engineering...
Heterogeneous CPU/FPGA devices, in which a CPU and an FPGA can execute together while sharing memory...
: Virtual memory based cache coherence is a mechanism that relies only on hardware that already exi...
2018-02-23Graphics Processing Units (GPUs) are designed primarily to execute multimedia, and game re...
Abstract—Current commercial solutions intended to provide additional resources to an application bei...