With the emergence of highly multithreaded architectures, an effective performance monitoring system must reflect the interaction between a large number of concurrent events, and associate the overall effect of individual events and inefficiencies to the operations in the application source code. The state-of-the-art performance counters in highly multithreaded graphic processors currently do not provide this level of precision. Although fine-grained sampling of performance counters after each source-level operation could potentially achieve the desired precision, the high frequency of sampling required will likely cause too much distortion to the actual application behavior and make the sampled counter values inaccurate. In this thesis, I...
Nowadays, multithreaded architectures are becoming more and more popular. In order to evaluate their...
Cutting-edge science and engineering applications require petascale computing. Petascale computing p...
With rising complexity of high performance computing systems and their parallel software, performanc...
textRecent graphics processing units (GPUs) have emerged as a promising platform for general purpose...
In recent years the power wall has prevented the continued scaling of single core performance. This ...
To analyze the performance of applications and architectures, both programmers and architects desire...
The increasing programmability, performance, and cost/effectiveness of GPUs have led to a widespread...
Parallelism is ubiquitous in modern computer architectures. Heterogeneity of CPU cores and deep memo...
Modern processors incorporate several performance monitoring units, which can be used to count event...
Concurrency levels in large-scale, distributed-memory supercomputers are rising exponentially. Moder...
Performance analysis tools are essential to the maintenance of efficient parallel execution of scien...
The efficiency of concurrent data structures is crucial to the performance of multi-threaded program...
Understanding why the performance of a multithreaded program does not improve linearly with the numb...
Systems for high performance computing are getting increasingly complex. On the one hand, the number...
The use of parallelism enhances the performance of a software system. However, its excessive use can...
Nowadays, multithreaded architectures are becoming more and more popular. In order to evaluate their...
Cutting-edge science and engineering applications require petascale computing. Petascale computing p...
With rising complexity of high performance computing systems and their parallel software, performanc...
textRecent graphics processing units (GPUs) have emerged as a promising platform for general purpose...
In recent years the power wall has prevented the continued scaling of single core performance. This ...
To analyze the performance of applications and architectures, both programmers and architects desire...
The increasing programmability, performance, and cost/effectiveness of GPUs have led to a widespread...
Parallelism is ubiquitous in modern computer architectures. Heterogeneity of CPU cores and deep memo...
Modern processors incorporate several performance monitoring units, which can be used to count event...
Concurrency levels in large-scale, distributed-memory supercomputers are rising exponentially. Moder...
Performance analysis tools are essential to the maintenance of efficient parallel execution of scien...
The efficiency of concurrent data structures is crucial to the performance of multi-threaded program...
Understanding why the performance of a multithreaded program does not improve linearly with the numb...
Systems for high performance computing are getting increasingly complex. On the one hand, the number...
The use of parallelism enhances the performance of a software system. However, its excessive use can...
Nowadays, multithreaded architectures are becoming more and more popular. In order to evaluate their...
Cutting-edge science and engineering applications require petascale computing. Petascale computing p...
With rising complexity of high performance computing systems and their parallel software, performanc...