This paper describes the design of a basic communication run-time library for the UPC parallel language and DEC's Memory Channel hardware. Also described is an implemen-tation of Cachet, an adaptive cache coherence protocol for distributed systems, which was added to the basic communication layer to even further boost performance. The implemen-tation of two Cachet micro-protocols: Cachet-Base and Cachet-WriterPush are described. The Cachet cache coherence scheme was implemented entirely in software on Alpha work-stations. The results of benchmarks running on the basic communication layer with and without Cachet are presented. These experiments show that the communication optimiza-tion can provide a performance improvement of up to an o...
If the trend of integrating more and more cores to a single die continues, general-purpose processor...
Hiding communication latency is an important optimization for parallel programs. Programmers or com...
Caches have the potential to provide multiprocessors with an automatic mechanism for reducing both n...
Global address space languages like UPC exhibit high performance and portability on a broad class o...
Thesis (Ph. D.)--University of Washington, 1997Two recent trends are affecting the design of medium-...
Global address space languages like UPC exhibit high performance and portability on a broad class of...
Recent developments in shared-memory multiprocessor systems advocate using off-the-shelf hardware to...
This paper considers a large scale, cache-based multiprocessor that is interconnected by a hierarchi...
Unified Parallel C (UPC) is an extension of ANSI C designed for parallel programming. UPC collective...
On the road to computer systems able to support the requirements of exascale applications, Chip Mult...
Partitioned Global Address Space (PGAS) languages appeared to address programmer productivity in lar...
UPC++ is a C++ library that supports high-performance computation via an asynchronous communication ...
The goal of Partitioned Global Address Space (PGAS) languages is to improve programmer productivity ...
Since the invention of the transistor, clock frequency increase was the primary method of improving ...
Distributed shared-memory systems provide scalable performance and a convenient model for parallel p...
If the trend of integrating more and more cores to a single die continues, general-purpose processor...
Hiding communication latency is an important optimization for parallel programs. Programmers or com...
Caches have the potential to provide multiprocessors with an automatic mechanism for reducing both n...
Global address space languages like UPC exhibit high performance and portability on a broad class o...
Thesis (Ph. D.)--University of Washington, 1997Two recent trends are affecting the design of medium-...
Global address space languages like UPC exhibit high performance and portability on a broad class of...
Recent developments in shared-memory multiprocessor systems advocate using off-the-shelf hardware to...
This paper considers a large scale, cache-based multiprocessor that is interconnected by a hierarchi...
Unified Parallel C (UPC) is an extension of ANSI C designed for parallel programming. UPC collective...
On the road to computer systems able to support the requirements of exascale applications, Chip Mult...
Partitioned Global Address Space (PGAS) languages appeared to address programmer productivity in lar...
UPC++ is a C++ library that supports high-performance computation via an asynchronous communication ...
The goal of Partitioned Global Address Space (PGAS) languages is to improve programmer productivity ...
Since the invention of the transistor, clock frequency increase was the primary method of improving ...
Distributed shared-memory systems provide scalable performance and a convenient model for parallel p...
If the trend of integrating more and more cores to a single die continues, general-purpose processor...
Hiding communication latency is an important optimization for parallel programs. Programmers or com...
Caches have the potential to provide multiprocessors with an automatic mechanism for reducing both n...