International audience—We describe how 2-level memory hierarchies can be exploited to optimize the implementation of teams in the parallel facet of the upcoming Fortran 2015 standard. We focus on reducing the cost associated with moving data within a computing node and between nodes, finding that this distinction is of key importance when looking at performance issues. We introduce a new hardware-aware approach for PGAS, to be used within a runtime system, to optimize the communications in the virtual topologies and clusters that are binding different teams together. We have applied, and implemented into the CAF OpenUH compiler, this methodology to three important collective operations, namely barrier, all-to-all reduction and one-to-all br...
The emergence of multicore processors led to an increasing complexity inside the modern servers, wit...
This talk discusses optimized collective algorithms and the benefits of leveraging independent hardw...
Technology trends suggest that future machines will rely on parallelism to meet increasing performan...
International audience—We describe how 2-level memory hierarchies can be exploited to optimize the i...
The Fortran 2008 language standard added a feature called "coarrays" to allow parallel programming i...
The Message Passing Interface (MPI) is the library-based programming model employed by most scalable...
Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014...
The Message Passing Interface (MPI) is the library-based programming model employed by most scalable...
Fortran remains a very widely used programming language for technical computing. Fortran coarrays ar...
Languages and libraries based on the Partitioned Global Address Space (PGAS) programming model have ...
Abstract. Most parallel systems on which MPI is used are now hierar-chical: some processors are much...
Large scale parallel simulations are fundamental tools for engineers and scientists. Consequently, i...
Partitioned global address space (PGAS) languages like UPC or Fortran provide a global name space to...
This paper describes a novel methodology for implementing a common set of collective communication o...
Optimized collective operations are a crucial performance factor for many scientific applications. T...
The emergence of multicore processors led to an increasing complexity inside the modern servers, wit...
This talk discusses optimized collective algorithms and the benefits of leveraging independent hardw...
Technology trends suggest that future machines will rely on parallelism to meet increasing performan...
International audience—We describe how 2-level memory hierarchies can be exploited to optimize the i...
The Fortran 2008 language standard added a feature called "coarrays" to allow parallel programming i...
The Message Passing Interface (MPI) is the library-based programming model employed by most scalable...
Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014...
The Message Passing Interface (MPI) is the library-based programming model employed by most scalable...
Fortran remains a very widely used programming language for technical computing. Fortran coarrays ar...
Languages and libraries based on the Partitioned Global Address Space (PGAS) programming model have ...
Abstract. Most parallel systems on which MPI is used are now hierar-chical: some processors are much...
Large scale parallel simulations are fundamental tools for engineers and scientists. Consequently, i...
Partitioned global address space (PGAS) languages like UPC or Fortran provide a global name space to...
This paper describes a novel methodology for implementing a common set of collective communication o...
Optimized collective operations are a crucial performance factor for many scientific applications. T...
The emergence of multicore processors led to an increasing complexity inside the modern servers, wit...
This talk discusses optimized collective algorithms and the benefits of leveraging independent hardw...
Technology trends suggest that future machines will rely on parallelism to meet increasing performan...