In this paper, we present two approaches to improve the execution of OpenMP applications on the IBM Cyclops multithreaded architecture. Both solutions are independent and they are focused to obtain better performance through a better management of the cache locality. The first solution is based on software modifications to the OpenMP runtime library to balance stack accesses across all data caches. The second solution is a small hardware modification to change the data cache mapping behavior, with the same goal. Both solutions help parallel applications to improve scalability and obtain better performance in this kind of architectures. In fact, they could also be applied to future multi-core processors. We have executed (using simulation) s...
Cavazos, JohnAs the high-performance computing (HPC) community continues the push towards exascale ...
Abstract. This paper is motivated by the desire to provide an efficient and scal-able software cache...
OpenMP tasking supports parallelization of irregular algorithms. Recent OpenMP specifications extend...
drodenas,xavim,eduard,jesus¡ In this paper, we present two approaches to improve the execution of Op...
Multithreaded architectures have the potential of tolerating large memory and functional unit latenc...
Cyclops is a new architecture for high performance parallel computers being developed at the IBM T. ...
OpenMP provides a portable programming interface for shared memory parallel computers (SMPs). Althou...
Cyclops is a new architecture for high performance par-allel computers being developed at the IBM T....
The most widely used node type in high-performance computing nowadays is a 2-socket server node. The...
AbstractThe performance of OpenMP applications executed in multisocket multicore processors can be l...
In this work, we present an OpenMP implementation suitable for multiprogrammed environments on Intel...
This paper presents COBRA (Continuous Binary Re-Adaptation), a runtime binary optimization framework...
International audienceIn [8], we demonstrated that contrary to sequential applications, parallel Ope...
Simultaneous multithreading is a technique that can improve performance when running parallel applic...
open5noopenMontagna, Fabio; Tagliavini, Giuseppe; Rossi, Davide; Garofalo, Angelo; Benini, LucaMonta...
Cavazos, JohnAs the high-performance computing (HPC) community continues the push towards exascale ...
Abstract. This paper is motivated by the desire to provide an efficient and scal-able software cache...
OpenMP tasking supports parallelization of irregular algorithms. Recent OpenMP specifications extend...
drodenas,xavim,eduard,jesus¡ In this paper, we present two approaches to improve the execution of Op...
Multithreaded architectures have the potential of tolerating large memory and functional unit latenc...
Cyclops is a new architecture for high performance parallel computers being developed at the IBM T. ...
OpenMP provides a portable programming interface for shared memory parallel computers (SMPs). Althou...
Cyclops is a new architecture for high performance par-allel computers being developed at the IBM T....
The most widely used node type in high-performance computing nowadays is a 2-socket server node. The...
AbstractThe performance of OpenMP applications executed in multisocket multicore processors can be l...
In this work, we present an OpenMP implementation suitable for multiprogrammed environments on Intel...
This paper presents COBRA (Continuous Binary Re-Adaptation), a runtime binary optimization framework...
International audienceIn [8], we demonstrated that contrary to sequential applications, parallel Ope...
Simultaneous multithreading is a technique that can improve performance when running parallel applic...
open5noopenMontagna, Fabio; Tagliavini, Giuseppe; Rossi, Davide; Garofalo, Angelo; Benini, LucaMonta...
Cavazos, JohnAs the high-performance computing (HPC) community continues the push towards exascale ...
Abstract. This paper is motivated by the desire to provide an efficient and scal-able software cache...
OpenMP tasking supports parallelization of irregular algorithms. Recent OpenMP specifications extend...