Current high performance computing architectures are composed of large shared memory NUMA nodes, among other components. Such nodes are becoming increasingly complex as they have several NUMA domains with different access latencies depending on the core where the access is issued. In this work, we propose techniques based on graph partitioning to efficiently mitigate the negative impact of NUMA effects on parallel applications performance, which are able to improve the execution time of OpenMP parallel codes 2.02× times on average when run on architectures with strong NUMA effects
Abstract. OpenMP has become the dominant standard for shared memory pro-gramming. It is traditionall...
Graph algorithms on parallel architectures present an in-teresting case study for irregular applicat...
Parallel task-based programming models like OpenMP support the declaration of task data dependences....
Current high performance computing architectures are composed of large shared memory NUMA nodes, amo...
The complexity of shared memory systems is becoming more relevant as the number of memory domains in...
The importance of high-performance graph processing to solve big data problems targeting high-impact...
Shared memory systems are becoming increasingly complex as they typically integrate several storage ...
Pattern matching on large graphs is the foundation for a variety of application domains. The continu...
Graph-structured analytics has been widely adopted in a number of big data applications such as soci...
Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processo...
The graph partitioning problem is critical to many traditional applications such as work balancing ...
Embedded manycore architectures are often organized as fabrics of tightly-coupled shared memory clus...
Task parallelism raises the level of abstraction in shared memory parallel programming to simplify t...
International audienceThe architecture of supercomputers is evolving to expose massive parallelism. ...
Graph-structured data can be found in nearly every aspect of today's world, be it road networks, soc...
Abstract. OpenMP has become the dominant standard for shared memory pro-gramming. It is traditionall...
Graph algorithms on parallel architectures present an in-teresting case study for irregular applicat...
Parallel task-based programming models like OpenMP support the declaration of task data dependences....
Current high performance computing architectures are composed of large shared memory NUMA nodes, amo...
The complexity of shared memory systems is becoming more relevant as the number of memory domains in...
The importance of high-performance graph processing to solve big data problems targeting high-impact...
Shared memory systems are becoming increasingly complex as they typically integrate several storage ...
Pattern matching on large graphs is the foundation for a variety of application domains. The continu...
Graph-structured analytics has been widely adopted in a number of big data applications such as soci...
Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processo...
The graph partitioning problem is critical to many traditional applications such as work balancing ...
Embedded manycore architectures are often organized as fabrics of tightly-coupled shared memory clus...
Task parallelism raises the level of abstraction in shared memory parallel programming to simplify t...
International audienceThe architecture of supercomputers is evolving to expose massive parallelism. ...
Graph-structured data can be found in nearly every aspect of today's world, be it road networks, soc...
Abstract. OpenMP has become the dominant standard for shared memory pro-gramming. It is traditionall...
Graph algorithms on parallel architectures present an in-teresting case study for irregular applicat...
Parallel task-based programming models like OpenMP support the declaration of task data dependences....