On modern hardware architectures, the performance of Flux Reconstruction (FR) methods can be limited by memory bandwidth. In a typical implementation, these methods are implemented as a chain of distinct kernels. Often, a dataset which has just been written in the main memory by a kernel is read back immediately by the next kernel. One way to avoid such a redundant expenditure of memory bandwidth is kernel fusion. However, on a practical level kernel fusion requires that the source for all kernels be available, thus preventing calls to certain third-party library functions. Moreover, it can add substantial complexity to a codebase. An alternative to full kernel fusion is cache blocking. But for this to be effective, CPU cache has to be mean...
Applications that exhibit regular memory access patterns usually benefit transparently from hardware...
Numerous advancements made in the field of computational sciences have made CFD a viable solution to...
Applications that exhibit regular memory access patterns usually benefit transparently from hardware...
This research focuses on evaluating and enhancing the performance of an in-house, structured, 2D CFD...
This research focuses on evaluating and enhancing the performance of an in-house, structured, 2D CFD...
This research focuses on evaluating and enhancing the performance of an in-house, structured, 2D CFD...
. Many current computer designs employ caches and a hierarchical memory architecture. The speed of a...
Abstract. Many current computer designs employ caches and a hierarchical memory architec-ture. The s...
Many state of the art CFD codes that exhibit low computational intensity (flops per RAM access) "sat...
Cache blocking is a technique widely used in scientific computing to minimize the exchange of inform...
AbstractCache blocking is a technique widely used in scientific computing to minimize the exchange o...
Blocking is a well-known optimization technique for improving the effectiveness of memory hierarchie...
Physics-based simulation, Computational Fluid Dynamics (CFD) in particular, has substantially reshap...
This paper presents a number of optimisations for improving the performance of unstructured computat...
Abstract—Increasingly, the main bottleneck limiting performance on emerging multi-core and many-core...
Applications that exhibit regular memory access patterns usually benefit transparently from hardware...
Numerous advancements made in the field of computational sciences have made CFD a viable solution to...
Applications that exhibit regular memory access patterns usually benefit transparently from hardware...
This research focuses on evaluating and enhancing the performance of an in-house, structured, 2D CFD...
This research focuses on evaluating and enhancing the performance of an in-house, structured, 2D CFD...
This research focuses on evaluating and enhancing the performance of an in-house, structured, 2D CFD...
. Many current computer designs employ caches and a hierarchical memory architecture. The speed of a...
Abstract. Many current computer designs employ caches and a hierarchical memory architec-ture. The s...
Many state of the art CFD codes that exhibit low computational intensity (flops per RAM access) "sat...
Cache blocking is a technique widely used in scientific computing to minimize the exchange of inform...
AbstractCache blocking is a technique widely used in scientific computing to minimize the exchange o...
Blocking is a well-known optimization technique for improving the effectiveness of memory hierarchie...
Physics-based simulation, Computational Fluid Dynamics (CFD) in particular, has substantially reshap...
This paper presents a number of optimisations for improving the performance of unstructured computat...
Abstract—Increasingly, the main bottleneck limiting performance on emerging multi-core and many-core...
Applications that exhibit regular memory access patterns usually benefit transparently from hardware...
Numerous advancements made in the field of computational sciences have made CFD a viable solution to...
Applications that exhibit regular memory access patterns usually benefit transparently from hardware...