Parallel programming has always been difficult due to the complexity of hardware and the diversity of applications. Although significant progress has been achieved over the years, attaining high parallel efficiency on large supercomputers for various applications is still quite challenging. As we go beyond the current scale of computers to those with peak capacities of an ExaFLOP/s, it is clear that an introspective and adaptive runtime system (RTS) will be critical to reduce programmers' tuning efforts by automatically handling the complexities of applications and machines. This is the motivation for my research on a Performance-analysis-based Introspective Control System - PICS. PICS intelligently steers parallel applications and ...
A considerably fraction of science discovery is nowadays relying on computer simulations. High Per...
This paper continues the discussion of parallel tool support with an overview of the current state o...
Abstract — A well organized parallel application can accomplish better performance over sequential e...
Parallel programming has always been difficult due to the complexity of hardware and the diversity o...
The tuning of parallel programs on large distributed-memory machines today is usually a costly, and ...
While parallel computing offers an attractive perspective for the future, developing efficient paral...
This paper describes a new parallel program tuning framework, with a new approach for tuning. The ap...
Parallelism is ubiquitous in modern computer architectures. Heterogeneity of CPU cores and deep memo...
A large and important class of national challenge applications are irregular, with complex, data dep...
The relative ease with which it is possible to build inexpensive, high-performance multicomputers u...
Performance analysis of parallel programs continues to be challenging for programmers. Programmers h...
Future supercomputers will require application developers to expose much more parallelism than curre...
Abstract — Performance of parallel programs is one of the reasons of their development. The process ...
It is desirable for general productivity that high-performance computing applications be portable to...
Increasing demands in the eld of high performance scienti c computing have lead to supercomputers ...
A considerably fraction of science discovery is nowadays relying on computer simulations. High Per...
This paper continues the discussion of parallel tool support with an overview of the current state o...
Abstract — A well organized parallel application can accomplish better performance over sequential e...
Parallel programming has always been difficult due to the complexity of hardware and the diversity o...
The tuning of parallel programs on large distributed-memory machines today is usually a costly, and ...
While parallel computing offers an attractive perspective for the future, developing efficient paral...
This paper describes a new parallel program tuning framework, with a new approach for tuning. The ap...
Parallelism is ubiquitous in modern computer architectures. Heterogeneity of CPU cores and deep memo...
A large and important class of national challenge applications are irregular, with complex, data dep...
The relative ease with which it is possible to build inexpensive, high-performance multicomputers u...
Performance analysis of parallel programs continues to be challenging for programmers. Programmers h...
Future supercomputers will require application developers to expose much more parallelism than curre...
Abstract — Performance of parallel programs is one of the reasons of their development. The process ...
It is desirable for general productivity that high-performance computing applications be portable to...
Increasing demands in the eld of high performance scienti c computing have lead to supercomputers ...
A considerably fraction of science discovery is nowadays relying on computer simulations. High Per...
This paper continues the discussion of parallel tool support with an overview of the current state o...
Abstract — A well organized parallel application can accomplish better performance over sequential e...