In the multi-core CPU world, transactional memory (TM)has emerged as an alternative to lock-based programming for thread synchronization. Recent research proposes the use of TM in GPU architectures, where a high number of computing threads, organized in SIMT fashion, requires an effective synchronization method. In contrast to CPUs, GPUs offer two memory spaces: global memory and local memory. The local memory space serves as a shared scratch-pad for a subset of the computing threads, and it is used by programmers to speed-up their applications thanks to its low latency. Prior work from the authors proposed a lightweight hardware TM (HTM) support based in the local memory, modifying the SIMT execution model and adding a conflict detection m...
We present BifurKTM, the first read-optimized Distributed Transactional Memory system for GPU cluste...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 2009.In the past, only a small ...
Conventional lock implementations serialize access to critical sections guarded by the same lock, pr...
Graphics Processing Units (GPUs) are popular hardware accelerators for data-parallel applications, e...
Graphics Processing Units (GPUs) have become the accelerator of choice for data-parallel application...
Graphics processor units (GPUs) are designed to efficiently exploit thread level parallelism (TLP), ...
The continued evolution of GPUs have enabled the use of irregular algorithms which involve fine-grai...
Transactional Memory (TM) aims to make shared memory parallel programming easier by abstracting away...
In this dissertation, we explore multiple designs for a Distributed Transactional Memory framework f...
Chip Multithreading (CMT) processors promise to deliver higher performance by running more than one ...
The recent trend of multicore CPUs pushes for major changes in software development. Traditional sin...
The heterogeneous Accelerated Processing Units (APUs) integrate a multi-core CPU and a GPU within th...
Parallel programming presents an efficient solution to exploit future multicore processors. Unfortu...
There has been considerable recent interest in the support of transactional memory (TM) in both har...
textThe increasing ubiquity of chip multiprocessor machines has made the need for accessible approac...
We present BifurKTM, the first read-optimized Distributed Transactional Memory system for GPU cluste...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 2009.In the past, only a small ...
Conventional lock implementations serialize access to critical sections guarded by the same lock, pr...
Graphics Processing Units (GPUs) are popular hardware accelerators for data-parallel applications, e...
Graphics Processing Units (GPUs) have become the accelerator of choice for data-parallel application...
Graphics processor units (GPUs) are designed to efficiently exploit thread level parallelism (TLP), ...
The continued evolution of GPUs have enabled the use of irregular algorithms which involve fine-grai...
Transactional Memory (TM) aims to make shared memory parallel programming easier by abstracting away...
In this dissertation, we explore multiple designs for a Distributed Transactional Memory framework f...
Chip Multithreading (CMT) processors promise to deliver higher performance by running more than one ...
The recent trend of multicore CPUs pushes for major changes in software development. Traditional sin...
The heterogeneous Accelerated Processing Units (APUs) integrate a multi-core CPU and a GPU within th...
Parallel programming presents an efficient solution to exploit future multicore processors. Unfortu...
There has been considerable recent interest in the support of transactional memory (TM) in both har...
textThe increasing ubiquity of chip multiprocessor machines has made the need for accessible approac...
We present BifurKTM, the first read-optimized Distributed Transactional Memory system for GPU cluste...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 2009.In the past, only a small ...
Conventional lock implementations serialize access to critical sections guarded by the same lock, pr...