This paper describes how specific architectural and implementation aspects of an SMP system (in the present work, an IBM POWER3 16-way node) influence performance and scalability of a very simple but real world code. The OpenMP parallelization of a Lattice Boltzmann Method code is presented. Scaling up to more CPUs and bigger grids successively elicits limitations arising from the hardware implementations. A sequence of substantial changes to the original implementation, aimed to overcome those obstacles, is detailed. The case study shows that an SMP system cannot be simply modelled as a bunch of processors sharing memory: making a big memory area available to each CPU in a single application can cause some processor implementation limits, ...
In this paper we address the problem of identifying and exploiting techniques that optimize the perf...
Numerical analysts and programmers are currently facing a conceptual change in processor technology....
Dense linear algebra libraries need to cope efficiently with a range of input problem sizes and shap...
The most widely used node type in high-performance computing nowadays is a 2-socket server node. The...
Lattice Boltzmann Methods (LBM) are an established mesoscopic approach for simulating a wide variety...
Today's supercomputers often consists of clusters of SMP nodes. Both OpenMP and MPI are programming ...
In this paper, we present the first system that implements OpenMP on a network of shared-memory mult...
Lattice Boltzmann Methods (LBM) are an established approach for simulating a wide variety of transpo...
This paper presents a new parallel programming environment called ParADE to enable easy, portable, ...
The novel ScaleMP vSMP architecture employs commodity x86-based servers with an InfiniBand network t...
AbstractOngoing research towards the development of a hybrid parallelization concept for lattice Bol...
With computer simulations real world phenomena can be analyzed in great detail. Computational fluid ...
In the next years, the first Exascale class supercomputers will go online. In this paper, we portray...
Abstract The architecture of high performance computing systems is becoming more and more heterogene...
AbstractThe architecture of high performance computing systems is becoming more and more heterogeneo...
In this paper we address the problem of identifying and exploiting techniques that optimize the perf...
Numerical analysts and programmers are currently facing a conceptual change in processor technology....
Dense linear algebra libraries need to cope efficiently with a range of input problem sizes and shap...
The most widely used node type in high-performance computing nowadays is a 2-socket server node. The...
Lattice Boltzmann Methods (LBM) are an established mesoscopic approach for simulating a wide variety...
Today's supercomputers often consists of clusters of SMP nodes. Both OpenMP and MPI are programming ...
In this paper, we present the first system that implements OpenMP on a network of shared-memory mult...
Lattice Boltzmann Methods (LBM) are an established approach for simulating a wide variety of transpo...
This paper presents a new parallel programming environment called ParADE to enable easy, portable, ...
The novel ScaleMP vSMP architecture employs commodity x86-based servers with an InfiniBand network t...
AbstractOngoing research towards the development of a hybrid parallelization concept for lattice Bol...
With computer simulations real world phenomena can be analyzed in great detail. Computational fluid ...
In the next years, the first Exascale class supercomputers will go online. In this paper, we portray...
Abstract The architecture of high performance computing systems is becoming more and more heterogene...
AbstractThe architecture of high performance computing systems is becoming more and more heterogeneo...
In this paper we address the problem of identifying and exploiting techniques that optimize the perf...
Numerical analysts and programmers are currently facing a conceptual change in processor technology....
Dense linear algebra libraries need to cope efficiently with a range of input problem sizes and shap...