Distributed shared-memory systems provide scalable performance and a convenient model for parallel programming. However, their non-uniform memory latency often makes it difficult to develop efficient parallel applications. Future systems should reduce communication cost to achieve better programmability and performance. We have developed a methodology, and implemented a suite of tools, to guide the search for improved codes and systems. As the result of one such search, we recommend a remote data caching technique that significantly reduces communication cost. We analyze applications by instrumenting their assembly-code sources. During execution, an instrumented application pipes a detailed trace to configuration independent (CIAT) and c...
Large-scale multiprocessors suffer from long latencies for remote accesses. Caching is by far the mo...
Distributed memory parallel architectures support a memory model where some memory accesses are loca...
Multiprocessors with shared memory are considered more general and easier to program than message-pa...
Distributed shared-memory systems provide scalable performance and a convenient model for parallel p...
Distributed shared-memory systems provide scalable performance and a convenient model for parallel p...
The transition to multi-core architectures can be attributed mainly to fundamental limitations in cl...
textThis dissertation explores techniques for reducing the costs of inter-processor communication i...
textThis dissertation explores techniques for reducing the costs of inter-processor communication i...
Thesis (Ph. D.)--University of Washington, 1997Two recent trends are affecting the design of medium-...
One common cause of poor performance in large-scale shared-memory multiprocessors is limited memory ...
Since the invention of the transistor, clock frequency increase was the primary method of improving ...
Since the invention of the transistor, clock frequency increase was the primary method of improving ...
Since the invention of the transistor, clock frequency increase was the primary method of improving ...
This paper describes the design and implementation of mechanisms for latency tolerance in the remote...
Shared memory is widely regarded as a more intuitive model than message passing for the development ...
Large-scale multiprocessors suffer from long latencies for remote accesses. Caching is by far the mo...
Distributed memory parallel architectures support a memory model where some memory accesses are loca...
Multiprocessors with shared memory are considered more general and easier to program than message-pa...
Distributed shared-memory systems provide scalable performance and a convenient model for parallel p...
Distributed shared-memory systems provide scalable performance and a convenient model for parallel p...
The transition to multi-core architectures can be attributed mainly to fundamental limitations in cl...
textThis dissertation explores techniques for reducing the costs of inter-processor communication i...
textThis dissertation explores techniques for reducing the costs of inter-processor communication i...
Thesis (Ph. D.)--University of Washington, 1997Two recent trends are affecting the design of medium-...
One common cause of poor performance in large-scale shared-memory multiprocessors is limited memory ...
Since the invention of the transistor, clock frequency increase was the primary method of improving ...
Since the invention of the transistor, clock frequency increase was the primary method of improving ...
Since the invention of the transistor, clock frequency increase was the primary method of improving ...
This paper describes the design and implementation of mechanisms for latency tolerance in the remote...
Shared memory is widely regarded as a more intuitive model than message passing for the development ...
Large-scale multiprocessors suffer from long latencies for remote accesses. Caching is by far the mo...
Distributed memory parallel architectures support a memory model where some memory accesses are loca...
Multiprocessors with shared memory are considered more general and easier to program than message-pa...