This paper presents a number of optimisations for improving the performance of unstructured computational fluid dynamics codes on multicore and manycore architectures such as the Intel Sandy Bridge, Broadwell and Skylake CPUs and the Intel Xeon Phi Knights Corner and Knights Landing manycore processors. We discuss and demonstrate their implementation in two distinct classes of computational kernels: face-based loops represented by the computation of fluxes and cell-based loops representing updates to state vectors. We present the importance of making efficient use of the underlying vector units in both classes of computational kernels with special emphasis on the changes required for vectorising face-based loops and their intrinsic indirect...
Abstract In unstructured finite volume method, loop on different mesh components such as cells, face...
Physics-based simulation, Computational Fluid Dynamics (CFD) in particular, has substantially reshap...
This paper highlights a three-year project by an interdisciplinary team on a legacy F77 computationa...
This paper presents a number of optimisations for improving the performance of unstructured computat...
Modern multicore and manycore processors exhibit multiple levels of parallelism through a wide range...
AbstractModern multicore and manycore processors exhibit multiple levels of parallelism through a wi...
unstructured mesh CFD Abstract. In this paper, we present optimization techniques that are crucial t...
This thesis presents a number of optimisations used for mapping the underlying computational pattern...
This article demonstrates the utility and implementation of software prefetching in an unstructured ...
This article demonstrates the utility and implementation of software prefetching in an unstructured ...
Abstract. Graphical Processing Units (GPUs) have shown acceleration factors over multicores for stru...
Prize winning PETSc-FUN3D aerodynamics code, extending it with highly-tuned shared-memory paralleliz...
Achieving optimal performance on the latest multi-core and many-core architectures depends more and ...
This thesis documents the analysis and optimization of a high-order finite difference computational ...
Achieving optimal performance on the latest multi-core and many-core architectures increasingly depe...
Abstract In unstructured finite volume method, loop on different mesh components such as cells, face...
Physics-based simulation, Computational Fluid Dynamics (CFD) in particular, has substantially reshap...
This paper highlights a three-year project by an interdisciplinary team on a legacy F77 computationa...
This paper presents a number of optimisations for improving the performance of unstructured computat...
Modern multicore and manycore processors exhibit multiple levels of parallelism through a wide range...
AbstractModern multicore and manycore processors exhibit multiple levels of parallelism through a wi...
unstructured mesh CFD Abstract. In this paper, we present optimization techniques that are crucial t...
This thesis presents a number of optimisations used for mapping the underlying computational pattern...
This article demonstrates the utility and implementation of software prefetching in an unstructured ...
This article demonstrates the utility and implementation of software prefetching in an unstructured ...
Abstract. Graphical Processing Units (GPUs) have shown acceleration factors over multicores for stru...
Prize winning PETSc-FUN3D aerodynamics code, extending it with highly-tuned shared-memory paralleliz...
Achieving optimal performance on the latest multi-core and many-core architectures depends more and ...
This thesis documents the analysis and optimization of a high-order finite difference computational ...
Achieving optimal performance on the latest multi-core and many-core architectures increasingly depe...
Abstract In unstructured finite volume method, loop on different mesh components such as cells, face...
Physics-based simulation, Computational Fluid Dynamics (CFD) in particular, has substantially reshap...
This paper highlights a three-year project by an interdisciplinary team on a legacy F77 computationa...