A limitation on the parallel performance of TORT on the CRAY J90 is the amount of extra work introduced by the multitasking algorithm itself. The extra work beyond that of the serial version of the code, called overhead, arises from the synchronization of the parallel tasks and the accumulation of results by the master task. The goal of recent updates to TORT was to reduce the time consumed by these activities. To help understand which components of the multitasking algorithm contribute significantly to the overhead, a parallel performance model was constructed and compared to measurements of actual timings of the code
In this thesis, we studied the behavior of parallel programs to understand how to automated the task...
Even fully parallel sharedmemory program sections may perform signicantly be low the ideal speedup o...
Modern supercomputers like CRAY X-MP and CRAY Y-MP achieve their high computing speed by using both ...
The existing parallel algorithms in the TORT discrete ordinates were updated to function in a UNI-CO...
The multitasking options in the three-dimensional neutral particle transport code TORT originally im...
Coarse-grained angular domain decomposition of the mesh sweep algorithm has been implemented in ORNL...
© 2018 The Author(s). Porting scientific key algorithms to HPC architectures requires a thorough und...
Today, most of the Cray multiprocessor systems are still used within a multiprogramming environment....
Many applications from scientific computing and physical simulations can benefit from a mixed task a...
Most performance debugging and tuning of parallel programs is based on the "measure-modify"...
ABSTRACT: In this paper, we describe how to write efficient, parallel codes for the Cray XMTTM syste...
Today most of the multiprocessor supercomputer systems are still used within a multiprogramming envi...
Parallel task-based programming models like OpenMP support the declaration of task data dependences....
Performance analysis of parallel programs continues to be challenging for programmers. Programmers h...
We have developed a hierarchical performance bounding methodology that attempts to explain the perfo...
In this thesis, we studied the behavior of parallel programs to understand how to automated the task...
Even fully parallel sharedmemory program sections may perform signicantly be low the ideal speedup o...
Modern supercomputers like CRAY X-MP and CRAY Y-MP achieve their high computing speed by using both ...
The existing parallel algorithms in the TORT discrete ordinates were updated to function in a UNI-CO...
The multitasking options in the three-dimensional neutral particle transport code TORT originally im...
Coarse-grained angular domain decomposition of the mesh sweep algorithm has been implemented in ORNL...
© 2018 The Author(s). Porting scientific key algorithms to HPC architectures requires a thorough und...
Today, most of the Cray multiprocessor systems are still used within a multiprogramming environment....
Many applications from scientific computing and physical simulations can benefit from a mixed task a...
Most performance debugging and tuning of parallel programs is based on the "measure-modify"...
ABSTRACT: In this paper, we describe how to write efficient, parallel codes for the Cray XMTTM syste...
Today most of the multiprocessor supercomputer systems are still used within a multiprogramming envi...
Parallel task-based programming models like OpenMP support the declaration of task data dependences....
Performance analysis of parallel programs continues to be challenging for programmers. Programmers h...
We have developed a hierarchical performance bounding methodology that attempts to explain the perfo...
In this thesis, we studied the behavior of parallel programs to understand how to automated the task...
Even fully parallel sharedmemory program sections may perform signicantly be low the ideal speedup o...
Modern supercomputers like CRAY X-MP and CRAY Y-MP achieve their high computing speed by using both ...