We describe our experiences in repeated cycles of performance optimization, benchmarking, and performance analysis of the Parallel Ocean Program (POP) on the Cray X1 at Oak Ridge National Laboratory. We discuss the implementation and performance impact of Co-Array Fortran replacements for communication latency-sensitive routines. We also discuss the performance evolution of the system software from May 2003 to May 2004, and the impact that this had on POP performance.
Cray’s third-generation massively parallel processing system. The system uses a single-processor nod...
Power constraints are forcing HPC systems to continue to increase hardware concurrency. Efficiently ...
After at least a decade of parallel tool development, parallelization of scientific applications rem...
The design of the Parallel Ocean Program (POP) is described with an emphasis on portability. Perform...
Oak Ridge National Laboratory recently installed a 32 processor Cray X1. In this paper, we describe ...
This paper will discuss one of these automatic tools that has been developed recently by Cray Resear...
The Parallel Ocean Program (POP) is used in many strongly eddying ocean circulation simulations. Ide...
Abstract—The Cray X1 supercomputer is a distributed shared memory vector multiprocessor, scalable to...
The Advanced Scientific Computers Project of Argonne's Applied Mathematics Division has two objectiv...
The incorporation of increasing core counts in modern processors used to build state-of-the-art supe...
A suite of thirteen large Fortran benchmark codes were run on Cray-2 and Cray X-MP supercomputers. T...
Oak Ridge National Laboratory recently received delivery of a 5,294 processor Cray XT3. The XT3 is C...
This paper describes investigations on the memory performance of the shared memory systems Cray X-MP...
During the last decade the scientific computing community has optimized many applications for execu...
On August 15, 2002 the Department of Energy (DOE) selected the Center for Computational Sciences (C...
Cray’s third-generation massively parallel processing system. The system uses a single-processor nod...
Power constraints are forcing HPC systems to continue to increase hardware concurrency. Efficiently ...
After at least a decade of parallel tool development, parallelization of scientific applications rem...
The design of the Parallel Ocean Program (POP) is described with an emphasis on portability. Perform...
Oak Ridge National Laboratory recently installed a 32 processor Cray X1. In this paper, we describe ...
This paper will discuss one of these automatic tools that has been developed recently by Cray Resear...
The Parallel Ocean Program (POP) is used in many strongly eddying ocean circulation simulations. Ide...
Abstract—The Cray X1 supercomputer is a distributed shared memory vector multiprocessor, scalable to...
The Advanced Scientific Computers Project of Argonne's Applied Mathematics Division has two objectiv...
The incorporation of increasing core counts in modern processors used to build state-of-the-art supe...
A suite of thirteen large Fortran benchmark codes were run on Cray-2 and Cray X-MP supercomputers. T...
Oak Ridge National Laboratory recently received delivery of a 5,294 processor Cray XT3. The XT3 is C...
This paper describes investigations on the memory performance of the shared memory systems Cray X-MP...
During the last decade the scientific computing community has optimized many applications for execu...
On August 15, 2002 the Department of Energy (DOE) selected the Center for Computational Sciences (C...
Cray’s third-generation massively parallel processing system. The system uses a single-processor nod...
Power constraints are forcing HPC systems to continue to increase hardware concurrency. Efficiently ...
After at least a decade of parallel tool development, parallelization of scientific applications rem...