Recent I/O technologies such as PCI-Express and 10Gb Ethernet enable unprecedented levels of I/O bandwidths in mainstream platforms. However, in traditional architectures, memory latency alone can limit processors from matching 10 Gb inbound network I/O traffic. We propose a platform-wide method called Direct Cache Access (DCA) to deliver inbound I/O data directly into processor caches. We demonstrate that DCA provides a significant reduction in memory latency and memory bandwidth for receive intensive network I/O applications. Analysis of benchmarks such as SPECWeb9, TPC-W and TPC-C shows that overall benefit depends on the relative volume of I/O to memory traffic as well as the spatial and temporal relationship between processor and I/O m...
Cache injection is a viable technique to improve the performance of data-intensive parallel applicat...
The long latencies introduced by remote accesses in a large multiprocessor can be hidden by caching....
The gap between CPU and main memory speeds has long been a performance bottleneck. As we move toward...
Memory access is the major bottleneck in realizing multi-hundred-gigabit networks with commodity har...
The exploration of techniques to accelerate big data applicationshas been an active area of research...
Ethernet continues to be the most widely used network architecture today due to its low cost and bac...
Exponential link bandwidth increase over the past decade has sparked off interest in increasingly co...
Increased peripheral performance is causing strain on the memory subsystem of modern processors. For...
Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016) ...
With the emergence of data-intensive applications, recent years have seen a fast-growing volume of I...
With the emergence of data-intensive applications, recent years have seen a fast-growing volume of I...
Keeping up with modern high-bandwidth networks is a significant challenge for system designers. A ke...
Abstract—Memory channel contention is a critical per-formance bottleneck in modern systems that have...
In modern (Intel) processors, Last Level Cache (LLC) is divided into multiple slices and an undocume...
Graduation date: 2015I/O transactions within a computer system have evolved along with other system ...
Cache injection is a viable technique to improve the performance of data-intensive parallel applicat...
The long latencies introduced by remote accesses in a large multiprocessor can be hidden by caching....
The gap between CPU and main memory speeds has long been a performance bottleneck. As we move toward...
Memory access is the major bottleneck in realizing multi-hundred-gigabit networks with commodity har...
The exploration of techniques to accelerate big data applicationshas been an active area of research...
Ethernet continues to be the most widely used network architecture today due to its low cost and bac...
Exponential link bandwidth increase over the past decade has sparked off interest in increasingly co...
Increased peripheral performance is causing strain on the memory subsystem of modern processors. For...
Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016) ...
With the emergence of data-intensive applications, recent years have seen a fast-growing volume of I...
With the emergence of data-intensive applications, recent years have seen a fast-growing volume of I...
Keeping up with modern high-bandwidth networks is a significant challenge for system designers. A ke...
Abstract—Memory channel contention is a critical per-formance bottleneck in modern systems that have...
In modern (Intel) processors, Last Level Cache (LLC) is divided into multiple slices and an undocume...
Graduation date: 2015I/O transactions within a computer system have evolved along with other system ...
Cache injection is a viable technique to improve the performance of data-intensive parallel applicat...
The long latencies introduced by remote accesses in a large multiprocessor can be hidden by caching....
The gap between CPU and main memory speeds has long been a performance bottleneck. As we move toward...