Data processing pipelines normally use lockless Single-Producer–Single-Consumer (SPSC) queues to efficiently decouple their processing threads and achieve high throughput, minimizing the cost of synchronization. SPSC queues have been widely studied, mostly for applications such as streaming data or network monitoring, where the main goal is maximizing throughput. There are now many applications, such as virtual-machine–virtual-machine communication, software-defined networking, and message-based kernels, where low latency is also important, and the tradeoffs between high-throughput and low-latency algorithms have not been studied equally well. Furthermore, at high or variable transaction rates, the effect of memory hierarchies and cache coh...
The performance gap between processor and memory continues to remain a major performance bottleneck ...
International audienceWe analyze the performance of CPU-bound network servers and demonstrate experi...
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarch...
Data processing pipelines normally use lockless Single-Producer–Single-Consumer (SPSC) queues to eff...
The use of efficient synchronization mechanisms is crucial for implementing fine grained parallel pr...
Using efficient point-to-point communication channels is critical for implementing fine grained para...
In applications such as sharded data processing systems, data flow programming and load sharing appl...
A lock-free FIFO queue data structure is presented in this paper. The algorithm supports multiple pr...
The rapid progress of multi-/many-core architectures has caused data-intensive parallel applications...
As core counts increase and as heterogeneity becomes more common in parallel computing, we face the ...
Core-to-core communication is critical to the effective use of multi-core processors. A number of so...
Software distributed shared memory (DSM) platforms on networks of workstations tolerate large networ...
Large-scale shared-memory multiprocessors typically have long latencies for remote data accesses. A...
In this paper, we investigate the design of highly efficient and scalable staged event-driven middle...
Journal ArticleFor a parallel architecture to scale effectively, communication latency between proce...
The performance gap between processor and memory continues to remain a major performance bottleneck ...
International audienceWe analyze the performance of CPU-bound network servers and demonstrate experi...
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarch...
Data processing pipelines normally use lockless Single-Producer–Single-Consumer (SPSC) queues to eff...
The use of efficient synchronization mechanisms is crucial for implementing fine grained parallel pr...
Using efficient point-to-point communication channels is critical for implementing fine grained para...
In applications such as sharded data processing systems, data flow programming and load sharing appl...
A lock-free FIFO queue data structure is presented in this paper. The algorithm supports multiple pr...
The rapid progress of multi-/many-core architectures has caused data-intensive parallel applications...
As core counts increase and as heterogeneity becomes more common in parallel computing, we face the ...
Core-to-core communication is critical to the effective use of multi-core processors. A number of so...
Software distributed shared memory (DSM) platforms on networks of workstations tolerate large networ...
Large-scale shared-memory multiprocessors typically have long latencies for remote data accesses. A...
In this paper, we investigate the design of highly efficient and scalable staged event-driven middle...
Journal ArticleFor a parallel architecture to scale effectively, communication latency between proce...
The performance gap between processor and memory continues to remain a major performance bottleneck ...
International audienceWe analyze the performance of CPU-bound network servers and demonstrate experi...
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarch...