This paper studies application performance on systems with strongly non-uniform remote memory access. In current generation NUMAs the speed difference between the slowest and fastest link in an interconnect---the "NUMA gap"---is typically less than an order of magnitude, and many conventional parallel programs achieve good performance. We study how different NUMA gaps influence application performance, up to and including typical wide-area latencies and bandwidths. We find that for gaps larger than those of current generation NUMAs, performance suffers considerably (for applications that were designed for a uniform access interconnect). For many applications, however, performance can be greatly improved with comparatively simple c...
Many parallel systems offer a simple view of memory: all storage cells are addressed uniformly. Desp...
Abstract—An important aspect of workload characterization is understanding memory system performance...
The authors approach network design from the perspective of the applications and ask how much networ...
This paper studies application performance on systems with strongly non-uniform remote memory access...
In scalable multiprocessor architectures, the times required for a processor to access various porti...
This work provides a systematic study of the impact of commu-nication performance on parallel applic...
CC-NUMA architectures have become extremely popular by providing fast and transparent access to data...
The latency of memory access times is hence non-uniform, because it depends on where the request ori...
The goal of this paper is to gain insight into the relative performance of communication mechanisms ...
Scalable multiprocessors that support a shared-memory image to application programmers are typically...
Non-uniform memory access (NUMA) architectures are modern shared-memory, multi-core machines offerin...
Non-Uniform Memory Access (NUMA) architectures make it possible to build large-scale shared memory m...
In the early years of parallel computing research, significant theoretical studies were done on inte...
Interconnection networks are one of the fundamental components of a supercomputing facility, and one...
Due to their excellent price-performance ratio, clusters built from commodity nodes have become broa...
Many parallel systems offer a simple view of memory: all storage cells are addressed uniformly. Desp...
Abstract—An important aspect of workload characterization is understanding memory system performance...
The authors approach network design from the perspective of the applications and ask how much networ...
This paper studies application performance on systems with strongly non-uniform remote memory access...
In scalable multiprocessor architectures, the times required for a processor to access various porti...
This work provides a systematic study of the impact of commu-nication performance on parallel applic...
CC-NUMA architectures have become extremely popular by providing fast and transparent access to data...
The latency of memory access times is hence non-uniform, because it depends on where the request ori...
The goal of this paper is to gain insight into the relative performance of communication mechanisms ...
Scalable multiprocessors that support a shared-memory image to application programmers are typically...
Non-uniform memory access (NUMA) architectures are modern shared-memory, multi-core machines offerin...
Non-Uniform Memory Access (NUMA) architectures make it possible to build large-scale shared memory m...
In the early years of parallel computing research, significant theoretical studies were done on inte...
Interconnection networks are one of the fundamental components of a supercomputing facility, and one...
Due to their excellent price-performance ratio, clusters built from commodity nodes have become broa...
Many parallel systems offer a simple view of memory: all storage cells are addressed uniformly. Desp...
Abstract—An important aspect of workload characterization is understanding memory system performance...
The authors approach network design from the perspective of the applications and ask how much networ...