Coarse-grained angular domain decomposition of the mesh sweep algorithm has been implemented in ORNL`s three dimensional transport code TORT for Cray`s macrotasking environment on platforms running the UNICOS operating system. A performance model constructed earlier is reviewed and its main result, namely the identification of the sources of parallelization overhead, is used to motivate the present work. The sources of overhead treated here are: redundant operations in the angular loop across participating tasks; repetitive task creation; lock utilization to prevent overwriting the flux moment arrays accumulated by the participating tasks. Substantial reduction in the parallelization overhead is demonstrated via sample runs with fixed tunni...
Efficient allocation of distinct subsets of processors to different jobs (i.e., space sharing) is cr...
Modern supercomputers like CRAY X-MP and CRAY Y-MP achieve their high computing speed by using both ...
A large class of computational problems are characterised by frequent synchronisation, and computati...
The multitasking options in the three-dimensional neutral particle transport code TORT originally im...
A limitation on the parallel performance of TORT on the CRAY J90 is the amount of extra work introdu...
The effect of three communication schemes for solving Arbitrarily High Order Transport (AHOT) method...
ABSTRACT: In this paper, we describe how to write efficient, parallel codes for the Cray XMTTM syste...
The success of parallel computing in solving real-life computationally-intensive problems relies on ...
The existing parallel algorithms in the TORT discrete ordinates were updated to function in a UNI-CO...
Abstract—Increasingly, the main bottleneck limiting performance on emerging multi-core and many-core...
As the efficiency of parallel software increases it is becoming common to measure near linear speedu...
This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling c...
time library [1] is a popular C++ parallelization environment [2][3] that offers a set of methods an...
The Boltzmann Transport Equation is solved on unstructured meshes using the Discrete Ordinates Metho...
With the rise of chip-multiprocessors, the problem of parallelizing general-purpose programs has onc...
Efficient allocation of distinct subsets of processors to different jobs (i.e., space sharing) is cr...
Modern supercomputers like CRAY X-MP and CRAY Y-MP achieve their high computing speed by using both ...
A large class of computational problems are characterised by frequent synchronisation, and computati...
The multitasking options in the three-dimensional neutral particle transport code TORT originally im...
A limitation on the parallel performance of TORT on the CRAY J90 is the amount of extra work introdu...
The effect of three communication schemes for solving Arbitrarily High Order Transport (AHOT) method...
ABSTRACT: In this paper, we describe how to write efficient, parallel codes for the Cray XMTTM syste...
The success of parallel computing in solving real-life computationally-intensive problems relies on ...
The existing parallel algorithms in the TORT discrete ordinates were updated to function in a UNI-CO...
Abstract—Increasingly, the main bottleneck limiting performance on emerging multi-core and many-core...
As the efficiency of parallel software increases it is becoming common to measure near linear speedu...
This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling c...
time library [1] is a popular C++ parallelization environment [2][3] that offers a set of methods an...
The Boltzmann Transport Equation is solved on unstructured meshes using the Discrete Ordinates Metho...
With the rise of chip-multiprocessors, the problem of parallelizing general-purpose programs has onc...
Efficient allocation of distinct subsets of processors to different jobs (i.e., space sharing) is cr...
Modern supercomputers like CRAY X-MP and CRAY Y-MP achieve their high computing speed by using both ...
A large class of computational problems are characterised by frequent synchronisation, and computati...