this paper, we compare the performance of DASH and Cedar under a set of varied parallel scientific loads. We focus on three main differences between the two machines: cache coherence supported in hardware (DASH) versus managed by the programmer (Cedar), task-based parallelism (DASH) versus loop-based parallelism (Cedar), and the hardware support to exploit cooperation among the processors in a cluster. In our analysis, we compare performance differences to the extra hardware or programming cost involved. Our results indicate that, for several common mediumgrain scientific applications, lack of hardware support for cache coherence is not a major problem. These codes are regular enough for the programmer to easily manage memory. We also note ...
Many-Task Computing (MTC) is a common scenario for multiple parallel systems, such as cluster, grids...
Scientific programs are typically characterized as floating-point intensive loop-dominated tasks wit...
The last decade has produced enormous improvements in processor speeds without a corresponding impro...
The fundamental premise behind the DASH project is that it is fea-sible to build large-scale shared-...
The growing gap between sustained and peak performance for scientific applications has become a well...
We compare the performance of three major programming models— a load-store cache-coherent shared add...
In this paper, we study a hardware-supported, compiler-directed (HSCD) cache coherence scheme, which...
A wide variety of computer architectures have been proposed to exploit parallelism at different gran...
Benchmarks are essential for objective comparison of computer performance. Established scientific co...
The multicore era has initiated a move to ubiquitous parallelization of software. In the process, co...
Abstract—Big and complex applications need many resources and long computation time to execute seque...
In this paper, we study a hardware-supported, compilerdirected (HSCD) cache coherence scheme, which ...
The growing gap between sustained and peak performance for scientific applications is a well-known ...
Obtaining high performance without machine-specific tuning is an important goal of scientific applic...
This thesis deals with how to develop scientific computing software that runs efficiently on multico...
Many-Task Computing (MTC) is a common scenario for multiple parallel systems, such as cluster, grids...
Scientific programs are typically characterized as floating-point intensive loop-dominated tasks wit...
The last decade has produced enormous improvements in processor speeds without a corresponding impro...
The fundamental premise behind the DASH project is that it is fea-sible to build large-scale shared-...
The growing gap between sustained and peak performance for scientific applications has become a well...
We compare the performance of three major programming models— a load-store cache-coherent shared add...
In this paper, we study a hardware-supported, compiler-directed (HSCD) cache coherence scheme, which...
A wide variety of computer architectures have been proposed to exploit parallelism at different gran...
Benchmarks are essential for objective comparison of computer performance. Established scientific co...
The multicore era has initiated a move to ubiquitous parallelization of software. In the process, co...
Abstract—Big and complex applications need many resources and long computation time to execute seque...
In this paper, we study a hardware-supported, compilerdirected (HSCD) cache coherence scheme, which ...
The growing gap between sustained and peak performance for scientific applications is a well-known ...
Obtaining high performance without machine-specific tuning is an important goal of scientific applic...
This thesis deals with how to develop scientific computing software that runs efficiently on multico...
Many-Task Computing (MTC) is a common scenario for multiple parallel systems, such as cluster, grids...
Scientific programs are typically characterized as floating-point intensive loop-dominated tasks wit...
The last decade has produced enormous improvements in processor speeds without a corresponding impro...