Many scientific applications are in need to solve a high number of small-size independent problems. These individual problems do not provide enough parallelism and then, these must be computed as a batch. Today, vendors such as Intel and NVIDIA are developing their own suite of batch routines. Although most of the works focus on computing batches of fixed size, in real applications we can not assume a uniform size for all set of problems. We explore and analyze different strategies based on parallel for, task and taskloop OpenMP pragmas. Although these strategies are straightforward from a programmer's point of view, they have a different impact on performance. We also analyze a new prototype provided by Intel (MKL), which deals with batch ...
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for a...
International audienceThis paper proposes an efficient heuristic algorithm for solving a complex bat...
In a general-purpose computing system, several parallel applications run simultaneously on the same ...
Many scientific applications are in need to solve a high number of small-size independent problems. ...
A current trend in high-performance computing is to decompose a large linear algebra prob- lem into ...
The OpenMP programming model provides parallel applications a very important feature: job malleabili...
International audienceIn the context of multicore programming, pipeline parallelism is a solution to...
OpenMP tasking supports parallelization of irregular algorithms. Recent OpenMP specifications extend...
During the last decade, managed runtime systems have been constantly evolving to become capable of e...
The concept of task already exists in many parallel programming models. Programmers express parallel...
In large-scale parallel computing that may contain many nodes, a computing task is often divided int...
International audienceComputing platforms are now extremely complex providing an increasing number o...
International audienceSparse direct solvers is a time consuming operation required by many scientifi...
The need for parallel programming models that are simple to use and at the same time efficient for c...
GPU devices are becoming a common element in current HPC platforms due to their high performance-per...
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for a...
International audienceThis paper proposes an efficient heuristic algorithm for solving a complex bat...
In a general-purpose computing system, several parallel applications run simultaneously on the same ...
Many scientific applications are in need to solve a high number of small-size independent problems. ...
A current trend in high-performance computing is to decompose a large linear algebra prob- lem into ...
The OpenMP programming model provides parallel applications a very important feature: job malleabili...
International audienceIn the context of multicore programming, pipeline parallelism is a solution to...
OpenMP tasking supports parallelization of irregular algorithms. Recent OpenMP specifications extend...
During the last decade, managed runtime systems have been constantly evolving to become capable of e...
The concept of task already exists in many parallel programming models. Programmers express parallel...
In large-scale parallel computing that may contain many nodes, a computing task is often divided int...
International audienceComputing platforms are now extremely complex providing an increasing number o...
International audienceSparse direct solvers is a time consuming operation required by many scientifi...
The need for parallel programming models that are simple to use and at the same time efficient for c...
GPU devices are becoming a common element in current HPC platforms due to their high performance-per...
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for a...
International audienceThis paper proposes an efficient heuristic algorithm for solving a complex bat...
In a general-purpose computing system, several parallel applications run simultaneously on the same ...