International audienceSequential applications can take advantage of multi-core systems by way of pipeline parallelism to improve their performance. In such parallelism, core to core communication overhead is the main limit of speedup. This paper presents BatchQueue, a fast and memory-thrifty core to core communication system based on batch processing of whole cache line. BatchQueue is able to send a 32bit word of data in just 12.5 ns on a Xeon X5472 and only needs 2 full cache lines plus 3 byte-sized variables -- each on a different cache line for optimal performance -- to work. The characteristics of BatchQueue -- high throughput and increased latency resulting from its batch processing -- makes it well suited for highly communicative task...
As high-performance computing (HPC) systems advance towards exascale (10^18 operations per second), ...
Data processing pipelines normally use lockless Single-Producer–Single-Consumer (SPSC) queues to eff...
Shared-memory multiprocessors are becoming increasingly popular as a high-performance, easy to progr...
Among the various paradigms of parallelization, pipeline parallelism has the advantage of maintainin...
International audienceIn the context of multicore programming, pipeline parallelism is a solution to...
This paper presents the design and evaluation of the M-cache, a small, fast and intelligent memory f...
Abstract—Many-core processors provide the raw computation power required by modern high-performance ...
Single chip multicore processors are now prevalent and processors with hundreds of cores are being p...
Core-to-core communication is critical to the effective use of multi-core processors. A number of so...
If the trend of integrating more and more cores to a single die continues, general-purpose processor...
Many-core processors provide the raw computation power required by modern high-performance multimedi...
Designing high-performance software queues for fast intercore communication is challenging, but crit...
Shared memory systems generally support consumerinitiated communication; when a process needs data,...
The difference between emerging many-core architectures and their multi-core predecessors goes beyon...
This paper describes the design of a basic communication run-time library for the UPC parallel langu...
As high-performance computing (HPC) systems advance towards exascale (10^18 operations per second), ...
Data processing pipelines normally use lockless Single-Producer–Single-Consumer (SPSC) queues to eff...
Shared-memory multiprocessors are becoming increasingly popular as a high-performance, easy to progr...
Among the various paradigms of parallelization, pipeline parallelism has the advantage of maintainin...
International audienceIn the context of multicore programming, pipeline parallelism is a solution to...
This paper presents the design and evaluation of the M-cache, a small, fast and intelligent memory f...
Abstract—Many-core processors provide the raw computation power required by modern high-performance ...
Single chip multicore processors are now prevalent and processors with hundreds of cores are being p...
Core-to-core communication is critical to the effective use of multi-core processors. A number of so...
If the trend of integrating more and more cores to a single die continues, general-purpose processor...
Many-core processors provide the raw computation power required by modern high-performance multimedi...
Designing high-performance software queues for fast intercore communication is challenging, but crit...
Shared memory systems generally support consumerinitiated communication; when a process needs data,...
The difference between emerging many-core architectures and their multi-core predecessors goes beyon...
This paper describes the design of a basic communication run-time library for the UPC parallel langu...
As high-performance computing (HPC) systems advance towards exascale (10^18 operations per second), ...
Data processing pipelines normally use lockless Single-Producer–Single-Consumer (SPSC) queues to eff...
Shared-memory multiprocessors are becoming increasingly popular as a high-performance, easy to progr...