This whitepaper studies the various aspects and challenges of performance scaling on large scale shared memory systems. Our experiments are performed on a large ccNUMA machine that consists of 72 IBM 3755 nodes connected with NumaConnect and provides shared memory over a total of 1728 cores, a number that is far beyond conventional server platforms. As benchmarks, three data-intensive and memory-bound applications with different communication patterns are selected, namely Jacobi, CSR SpMV and Floyd-Warshall. Our results illustrate the need for numa-aware design and implementation of shared-memory parallel algorithms in order to achieve scaling to high core counts. At the same time, we observed that, depending on its communication pattern, a...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Shared memory systems are becoming increasingly complex as they typically integrate several storage ...
grantor: University of TorontoThis dissertation considers the design and analysis of NUMAc...
Non-uniform memory access (NUMA) architectures are modern shared-memory, multi-core machines offerin...
The latency of memory access times is hence non-uniform, because it depends on where the request ori...
This is a post-peer-review, pre-copyedit version of an article published in [insert journal title]. ...
While the growing number of cores per chip allows researchers to solve larger scientific and enginee...
The increasing number of cores per processor is turning multicore-based systems in pervasive. This i...
Due to their excellent price-performance ratio, clusters built from commodity nodes have become broa...
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Many-core systems are a common p...
Abstract To scale up to high end congurations shared memory multiprocessors are evolvin
In scalable multiprocessor architectures, the times required for a processor to access various porti...
Abstract—Currently, parallel platforms based on large scale hierarchical shared memory multiprocesso...
Nonuniform memory access time (referred to as NUMA) is an important feature in the design of large s...
As the adoption of Big Data technologies becomes the norm in an increasing number of scenarios, ther...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Shared memory systems are becoming increasingly complex as they typically integrate several storage ...
grantor: University of TorontoThis dissertation considers the design and analysis of NUMAc...
Non-uniform memory access (NUMA) architectures are modern shared-memory, multi-core machines offerin...
The latency of memory access times is hence non-uniform, because it depends on where the request ori...
This is a post-peer-review, pre-copyedit version of an article published in [insert journal title]. ...
While the growing number of cores per chip allows researchers to solve larger scientific and enginee...
The increasing number of cores per processor is turning multicore-based systems in pervasive. This i...
Due to their excellent price-performance ratio, clusters built from commodity nodes have become broa...
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Many-core systems are a common p...
Abstract To scale up to high end congurations shared memory multiprocessors are evolvin
In scalable multiprocessor architectures, the times required for a processor to access various porti...
Abstract—Currently, parallel platforms based on large scale hierarchical shared memory multiprocesso...
Nonuniform memory access time (referred to as NUMA) is an important feature in the design of large s...
As the adoption of Big Data technologies becomes the norm in an increasing number of scenarios, ther...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Shared memory systems are becoming increasingly complex as they typically integrate several storage ...
grantor: University of TorontoThis dissertation considers the design and analysis of NUMAc...