In this thesis, we studied the behavior of parallel programs to understand how to automated the task of hiding latency of paging, input/output, and communication operations on massively parallel processing systems. We designed a parallel performance monitoring environment, the Musketeers, and investigated its use to improve the performance of parallel programs on DMIMD systems. In designing the Musketeers, we examined the interference of the instrumentation on the execution of parallel programs and presented some alternatives to minimize its effects. Since collecting performance data by monitoring program execution is only the first step to understanding the behavior of programs, we provided several customized monitoring environments and an...
Effective overlap of computation and communication is a well understood technique for latency hiding...
Parallelism is ubiquitous in modern computer architectures. Heterogeneity of CPU cores and deep memo...
Journal PaperCurrent microprocessors incorporate techniques to aggressively exploit instruction-leve...
We present a suite Df tools- the Musketeers- for monitoring and analysis of paging, I/O and communic...
In this paper we introduce a mdhodology for the analysis of the paging activity of parallel programs...
The amount of parallelism in modern supercomputers currently grows from generation to generation. Fu...
The amount of parallelism in modern supercomputers currently grows from generation to generation, an...
Abstract—Traditional performance analysis techniques are performed after a parallel program has comp...
This dissertation presents two new developments in the area of computer program preparation for para...
The CPUs, memory, interconnection network, operating system, runtime system, I/O subsystem, and appl...
A fundamental problem with parallel program monitoring tools is the intrusiveness introduced by inst...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 1996.Designing high performance...
It is easy to find errors and inefficient parts of a sequential program, by using a standard debugge...
To understand or improve the execution behavior of a program on a parallel system, it is often neces...
Parallel architectures, like the transputer-based multicomputer network, offer potentially enormous...
Effective overlap of computation and communication is a well understood technique for latency hiding...
Parallelism is ubiquitous in modern computer architectures. Heterogeneity of CPU cores and deep memo...
Journal PaperCurrent microprocessors incorporate techniques to aggressively exploit instruction-leve...
We present a suite Df tools- the Musketeers- for monitoring and analysis of paging, I/O and communic...
In this paper we introduce a mdhodology for the analysis of the paging activity of parallel programs...
The amount of parallelism in modern supercomputers currently grows from generation to generation. Fu...
The amount of parallelism in modern supercomputers currently grows from generation to generation, an...
Abstract—Traditional performance analysis techniques are performed after a parallel program has comp...
This dissertation presents two new developments in the area of computer program preparation for para...
The CPUs, memory, interconnection network, operating system, runtime system, I/O subsystem, and appl...
A fundamental problem with parallel program monitoring tools is the intrusiveness introduced by inst...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 1996.Designing high performance...
It is easy to find errors and inefficient parts of a sequential program, by using a standard debugge...
To understand or improve the execution behavior of a program on a parallel system, it is often neces...
Parallel architectures, like the transputer-based multicomputer network, offer potentially enormous...
Effective overlap of computation and communication is a well understood technique for latency hiding...
Parallelism is ubiquitous in modern computer architectures. Heterogeneity of CPU cores and deep memo...
Journal PaperCurrent microprocessors incorporate techniques to aggressively exploit instruction-leve...