International audienceThis work presents a realistic performance model to execute scientific workflows on high-bandwidth-memory architectures such as the Intel Knights Landing. We provide a detailed analysis of the execution time on such platforms, taking into account transfers from both fast and slow memory and their overlap with computations. We discuss several scheduling and mapping strategies: not only tasks must be assigned to computing resources, but also one has to decide which fraction of input and output data will reside in fast memory and which will have to stay in slow memory. We use extensive simulations to assess the impact of the mapping strategies on performance. We also conduct experiments for a simple 1D Gauss-Seidel kernel...
International audienceThe ever growing complexity and scale of parallel architectures imposes to rew...
In this work, a model of computation for shared memory parallelism is presented. To address fundamen...
The proliferation of multi-core, accelerator-enabled embedded systems has introduced new opportuniti...
International audienceThis work presents a realistic performance model to execute scientific workflo...
This work presents a realistic performance model to execute scientific workflows on high-bandwidth m...
International audienceThe increasing computation capability of servers comes with a dramatic increas...
High performance computing (HPC) demands huge memory bandwidth and computing resources to achieve ma...
International audienceIn distributed memory systems, it is paramount to develop strategies to overla...
International audienceMulticore architectures featuring specialized accelerators are getting an incr...
International audienceDue to the advent of modern hardware architectures of high-performance comput-...
The memory system has been evolving at a fast pace recently, driven by the emergence of large-scale ...
This electronic version was submitted by the student author. The certified thesis is available in th...
Since Graphics Processing Units (CPUs) have increasingly gained popularity amoung non-graphic and co...
Analytical performance models yield valuable architectural insight without incurring the excessive r...
We have developed a hierarchical performance bounding methodology that attempts to explain the perfo...
International audienceThe ever growing complexity and scale of parallel architectures imposes to rew...
In this work, a model of computation for shared memory parallelism is presented. To address fundamen...
The proliferation of multi-core, accelerator-enabled embedded systems has introduced new opportuniti...
International audienceThis work presents a realistic performance model to execute scientific workflo...
This work presents a realistic performance model to execute scientific workflows on high-bandwidth m...
International audienceThe increasing computation capability of servers comes with a dramatic increas...
High performance computing (HPC) demands huge memory bandwidth and computing resources to achieve ma...
International audienceIn distributed memory systems, it is paramount to develop strategies to overla...
International audienceMulticore architectures featuring specialized accelerators are getting an incr...
International audienceDue to the advent of modern hardware architectures of high-performance comput-...
The memory system has been evolving at a fast pace recently, driven by the emergence of large-scale ...
This electronic version was submitted by the student author. The certified thesis is available in th...
Since Graphics Processing Units (CPUs) have increasingly gained popularity amoung non-graphic and co...
Analytical performance models yield valuable architectural insight without incurring the excessive r...
We have developed a hierarchical performance bounding methodology that attempts to explain the perfo...
International audienceThe ever growing complexity and scale of parallel architectures imposes to rew...
In this work, a model of computation for shared memory parallelism is presented. To address fundamen...
The proliferation of multi-core, accelerator-enabled embedded systems has introduced new opportuniti...