This paper investigates the performance implications of data placement in OpenMP programs running on modern ccNUMA multiprocessors. Data locality and minimization of the rate of remote memory accesses are critical for sustaining high performance on these systems. We show that due to the low remote-to-local memory access latency ratio of state-of-the-art ccNUMA architectures, reasonably balanced page placement schemes, such as round-robin or random distribution of pages incur modest performance losses. We also show that performance leaks stemming from suboptimal page placement schemes can be remedied with a smart user-level page migration engine. The main body of the paper describes how the OpenMP runtime environment can use page migration f...
Application virtual address space is divided into pages, each requiring a virtual-to-physical transl...
Cluster OpenMP enables the use of the OpenMP shared memory programming clusters. Intel has released ...
Abstract. The scalability of an OpenMP program in a ccNUMA system with a large number of processors ...
This paper makes two important contributions. First, the paper investigates the performance implicat...
This paper makes two important contributions. First, the paper investigates the performance implicat...
This paper makes two important contributions. First, the pa-per investigates the performance implica...
This paper compares data distribution methodologies for scaling the performance of OpenMP on NUMA ar...
This paper compares data distribution methodologies for scaling the perfor-mance of OpenMP on NUMA a...
This paper describes transparent mechanisms for emulating some of the data distribution facilities ...
jesus,eduard¦ Abstract. This paper describes transparent mechanisms for emulating some of the data d...
It is well known that, although cc-NUMA architectures allow construction of large scale shared memor...
This paper presents user-level dynamic page migration, a runtime technique which transparently enabl...
The fast emergence of OpenMP as the preferable parallel programming paradigm for small-to-medium sca...
Locality of computation is key to obtaining high performance on a broad variety of parallel architec...
Abstract: A key problem for shared-memory systems is unpredictable perfor-mance. A critical in uence...
Application virtual address space is divided into pages, each requiring a virtual-to-physical transl...
Cluster OpenMP enables the use of the OpenMP shared memory programming clusters. Intel has released ...
Abstract. The scalability of an OpenMP program in a ccNUMA system with a large number of processors ...
This paper makes two important contributions. First, the paper investigates the performance implicat...
This paper makes two important contributions. First, the paper investigates the performance implicat...
This paper makes two important contributions. First, the pa-per investigates the performance implica...
This paper compares data distribution methodologies for scaling the performance of OpenMP on NUMA ar...
This paper compares data distribution methodologies for scaling the perfor-mance of OpenMP on NUMA a...
This paper describes transparent mechanisms for emulating some of the data distribution facilities ...
jesus,eduard¦ Abstract. This paper describes transparent mechanisms for emulating some of the data d...
It is well known that, although cc-NUMA architectures allow construction of large scale shared memor...
This paper presents user-level dynamic page migration, a runtime technique which transparently enabl...
The fast emergence of OpenMP as the preferable parallel programming paradigm for small-to-medium sca...
Locality of computation is key to obtaining high performance on a broad variety of parallel architec...
Abstract: A key problem for shared-memory systems is unpredictable perfor-mance. A critical in uence...
Application virtual address space is divided into pages, each requiring a virtual-to-physical transl...
Cluster OpenMP enables the use of the OpenMP shared memory programming clusters. Intel has released ...
Abstract. The scalability of an OpenMP program in a ccNUMA system with a large number of processors ...