Effective use of large-scale multiprocessors requires the elimination of all bottlenecks that reduce processor utilization. One such bottleneck is memory contention. In this paper we show that memory contention occurs in many parallel applications, when those applications are run on large-scale shared-memory multiprocessors. In our simulations of several parallel applications on a large-scale machine, we observed that some applications exhibit near-perfect speedup on hundreds of processors when the effect of memory contention is ignored, and exhibit no speedup at all when memory contention is considered. As the number of processors is increased, many applications exhibit an increase in both the number of hot spots and in the degree of conte...
Transactional Memory API utilizes contention managers to guarantee that whenever two transactions ha...
Abstract. Most complexity measures for concurrent algorithms for asynchronous shared-memory architec...
In highly-pipelined machines, instructions and data are prefetched and buffered in both the processo...
'5 Effective use of large-scale multiprocessors requires the elimination of all bottlenecks tha...
Shared-memory multiprocessors built from commodity microprocessors are being increasingly used to pr...
Scalable multiprocessors that support a shared-memory image to application programmers are typically...
Thesis (Ph. D.)--University of Washington, 1987Shared-memory multiprocessors offer increased computa...
One common cause of poor performance in large-scale shared-memory multiprocessors is limited memory ...
Memory contention can be a major source of overhead in large-scale shared-memory multiprocessors. Al...
An important architectural design decision affecting the performance of coherent caches in shared-me...
We demonstrate the profound effects of contention on the performance of page-based software distribu...
The multicore era has initiated a move to ubiquitous parallelization of software. In the process, co...
Current architecture trends results in processors being equipped with more cores and larger shared c...
We argue that OS-provided data coherence on non-cache-coherent NUMA multiprocessors (machines with a...
Parallel applications exhibit a wide variety of memory reference patterns. Designing a memory archit...
Transactional Memory API utilizes contention managers to guarantee that whenever two transactions ha...
Abstract. Most complexity measures for concurrent algorithms for asynchronous shared-memory architec...
In highly-pipelined machines, instructions and data are prefetched and buffered in both the processo...
'5 Effective use of large-scale multiprocessors requires the elimination of all bottlenecks tha...
Shared-memory multiprocessors built from commodity microprocessors are being increasingly used to pr...
Scalable multiprocessors that support a shared-memory image to application programmers are typically...
Thesis (Ph. D.)--University of Washington, 1987Shared-memory multiprocessors offer increased computa...
One common cause of poor performance in large-scale shared-memory multiprocessors is limited memory ...
Memory contention can be a major source of overhead in large-scale shared-memory multiprocessors. Al...
An important architectural design decision affecting the performance of coherent caches in shared-me...
We demonstrate the profound effects of contention on the performance of page-based software distribu...
The multicore era has initiated a move to ubiquitous parallelization of software. In the process, co...
Current architecture trends results in processors being equipped with more cores and larger shared c...
We argue that OS-provided data coherence on non-cache-coherent NUMA multiprocessors (machines with a...
Parallel applications exhibit a wide variety of memory reference patterns. Designing a memory archit...
Transactional Memory API utilizes contention managers to guarantee that whenever two transactions ha...
Abstract. Most complexity measures for concurrent algorithms for asynchronous shared-memory architec...
In highly-pipelined machines, instructions and data are prefetched and buffered in both the processo...