This study is aimed at examining the performance of dynamic, irregular and loosely synchronous class of applications on the KSR1 distributed shared memory COMA system. The Barnes-Hut tree based algorithm for simulating galactic evolution [1], was chosen as a representative of this class of applications. The performance measures include the overall time-stepping loop execution time, the efficacy of the scaling rules (EES and RCTS) proposed in [2] as well as the computational load balance achieved by the CostZone data partitioning scheme [1] under these scaling rules. We define notions of geographical locality, transfer locality flux and partition locality flux to explain the sources of remote memory accesses in the application. The contribu...
The gap between processor speed and memory latency has led to the use of caches in the memory system...
The gap between processor speed and memory latency has led to the use of caches in the memory system...
Memory accesses in modern processors are both far slower and vastly more energy-expensive than the a...
Scalability of parallel architectures is an interesting area of current research. Shared memory para...
Scalability of parallel architectures is an interesting area of current research. Shared memory pa...
In many parallel applications, network latency causes a dramatic loss in processor utilization...
The Kendall Square Research KSR1 MPP system has a shared address space, which spreads over physicall...
In this thesis, we studied the behavior of parallel programs to understand how to automated the task...
Abstract Communications overhead is one of the most important factors affecting per-fonnance in mess...
In supercomputing systems, architectural changes that increase computational power are often reflect...
Journal PaperCurrent microprocessors incorporate techniques to aggressively exploit instruction-leve...
We have developed a hierarchical performance bounding meth-odology that attempts to explain the perf...
One method to evaluate a distributed shared memory(DSM) system is to analyze its performance for a v...
This paper describes the design and implementation of mechanisms for latency tolerance in the remote...
Efficient global illumination is an important challenge in computer graphics. Themain problemof thes...
The gap between processor speed and memory latency has led to the use of caches in the memory system...
The gap between processor speed and memory latency has led to the use of caches in the memory system...
Memory accesses in modern processors are both far slower and vastly more energy-expensive than the a...
Scalability of parallel architectures is an interesting area of current research. Shared memory para...
Scalability of parallel architectures is an interesting area of current research. Shared memory pa...
In many parallel applications, network latency causes a dramatic loss in processor utilization...
The Kendall Square Research KSR1 MPP system has a shared address space, which spreads over physicall...
In this thesis, we studied the behavior of parallel programs to understand how to automated the task...
Abstract Communications overhead is one of the most important factors affecting per-fonnance in mess...
In supercomputing systems, architectural changes that increase computational power are often reflect...
Journal PaperCurrent microprocessors incorporate techniques to aggressively exploit instruction-leve...
We have developed a hierarchical performance bounding meth-odology that attempts to explain the perf...
One method to evaluate a distributed shared memory(DSM) system is to analyze its performance for a v...
This paper describes the design and implementation of mechanisms for latency tolerance in the remote...
Efficient global illumination is an important challenge in computer graphics. Themain problemof thes...
The gap between processor speed and memory latency has led to the use of caches in the memory system...
The gap between processor speed and memory latency has led to the use of caches in the memory system...
Memory accesses in modern processors are both far slower and vastly more energy-expensive than the a...