We investigate the construction and application of parallel software caches in shared memory multiprocessors. In contrast to maintaining a private cache for each thread, a parallel cache allows the re-use of results of lengthy computations by other threads. This is especially important in irregular applications where the re-use of intermediate results by scheduling is not possible. Example applications are the computation of intersections between a scanline and a polygon in computational geometry, and the computation of intersections between rays and objects in ray tracing. A parallel software cache is based on a readers/writers lock, i.e. as long as no thread alters the cache data structure, multiple threads may read simultaneously. If a t...
An adaptive cache coherence mechanism exploits semantic information about the expected or observed a...
Maximal utilization of cores in multicore architectures is key to realize the potential performance ...
On the road to computer systems able to support the requirements of exascale applications, Chip Mult...
We investigate the construction and application of parallel software caches in shared memory multipr...
Reordering instructions and data layout can bring significant performance improvement for memory bou...
technical reportThe next generation of scalable parallel systems (e.g., machines by KSR, Convex, and...
A wide variety of computer architectures have been proposed to exploit parallelism at different gran...
A wide variety of computer architectures have been proposed to exploit parallelism at different gran...
Improvements in the processing speed of multiprocessors are outpacing improvements in the speed of d...
Multi-core processors have become the dominant processor architecture with 2, 4, and 8 cores on a ch...
AbstractCache thrashing due to true data sharing can degrade the performance of parallel programs si...
Applications with regular patterns of memory access can experience high levels of cache conflict mis...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 2011.Computer architects have e...
Reordering instructions and data layout can bring significant performance improvement for memory bou...
The goal of the RAP-WAM AND-parallel Prolog abstract architecture is to provide inference speeds sig...
An adaptive cache coherence mechanism exploits semantic information about the expected or observed a...
Maximal utilization of cores in multicore architectures is key to realize the potential performance ...
On the road to computer systems able to support the requirements of exascale applications, Chip Mult...
We investigate the construction and application of parallel software caches in shared memory multipr...
Reordering instructions and data layout can bring significant performance improvement for memory bou...
technical reportThe next generation of scalable parallel systems (e.g., machines by KSR, Convex, and...
A wide variety of computer architectures have been proposed to exploit parallelism at different gran...
A wide variety of computer architectures have been proposed to exploit parallelism at different gran...
Improvements in the processing speed of multiprocessors are outpacing improvements in the speed of d...
Multi-core processors have become the dominant processor architecture with 2, 4, and 8 cores on a ch...
AbstractCache thrashing due to true data sharing can degrade the performance of parallel programs si...
Applications with regular patterns of memory access can experience high levels of cache conflict mis...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 2011.Computer architects have e...
Reordering instructions and data layout can bring significant performance improvement for memory bou...
The goal of the RAP-WAM AND-parallel Prolog abstract architecture is to provide inference speeds sig...
An adaptive cache coherence mechanism exploits semantic information about the expected or observed a...
Maximal utilization of cores in multicore architectures is key to realize the potential performance ...
On the road to computer systems able to support the requirements of exascale applications, Chip Mult...