International audienceIn high-performance computing, the application's workload must be evenly balanced among threads to deliver cutting-edge performance and scalability. In OpenMP, the load balancing problem arises when scheduling loop iterations to threads. In this context, several scheduling strategies have been proposed, but they do not take into account the input workload of the application and thus turn out to be suboptimal. In this work, we introduce a design methodology to propose, study, and assess the performance of workload-aware loop scheduling strategies. In this methodology, a genetic algorithm is employed to explore the state space solution of the problem itself and to guide the design of new loop scheduling strategies, and a...
The scheduling of loops for architectures which support instruction level parallelism is an importan...
International audienceThis paper presents how we have achieved the parallelization of Aevol, a biolo...
International audienceApproaching the theoretical performance of hierarchical multicore machines req...
International audienceIn high-performance computing, the application's workload must be evenly balan...
The input workload of an irregular application must be evenly distributed amongits threads to enable...
International audienceIn High Performance Computing, the application's workload must be well balance...
National audienceWorkload-aware loop schedulers were introduced to deliver better performance than c...
International audienceWorkload-aware loop schedulers were introduced to deliver better performance t...
The High Performance Computing community seeks for efficient and scalable solutions to meet the ever...
International audienceNowadays shared memory HPC platforms expose a large number of cores organized ...
In light of continued advances in loop scheduling, this work revisits the OpenMP loop scheduling by ...
Increasing node and cores-per-node counts in supercomputers render scheduling and load balancing cri...
Parallel applications are highly irregular and high-performance computing (HPC) infrastructures are ...
Parallel applications are highly irregular and high performance computing (HPC) infrastructures are ...
Scientific applications are large, complex, irregular, and computationally intensive and are charact...
The scheduling of loops for architectures which support instruction level parallelism is an importan...
International audienceThis paper presents how we have achieved the parallelization of Aevol, a biolo...
International audienceApproaching the theoretical performance of hierarchical multicore machines req...
International audienceIn high-performance computing, the application's workload must be evenly balan...
The input workload of an irregular application must be evenly distributed amongits threads to enable...
International audienceIn High Performance Computing, the application's workload must be well balance...
National audienceWorkload-aware loop schedulers were introduced to deliver better performance than c...
International audienceWorkload-aware loop schedulers were introduced to deliver better performance t...
The High Performance Computing community seeks for efficient and scalable solutions to meet the ever...
International audienceNowadays shared memory HPC platforms expose a large number of cores organized ...
In light of continued advances in loop scheduling, this work revisits the OpenMP loop scheduling by ...
Increasing node and cores-per-node counts in supercomputers render scheduling and load balancing cri...
Parallel applications are highly irregular and high-performance computing (HPC) infrastructures are ...
Parallel applications are highly irregular and high performance computing (HPC) infrastructures are ...
Scientific applications are large, complex, irregular, and computationally intensive and are charact...
The scheduling of loops for architectures which support instruction level parallelism is an importan...
International audienceThis paper presents how we have achieved the parallelization of Aevol, a biolo...
International audienceApproaching the theoretical performance of hierarchical multicore machines req...