This paper presents a number of optimisations for improving the performance of unstructured computational fluid dynamics codes on multicore and manycore architectures such as the Intel Sandy Bridge, Broadwell and Skylake CPUs and the Intel Xeon Phi Knights Corner and Knights Landing manycore processors. We discuss and demonstrate their implementation in two distinct classes of computational kernels: face-based loops represented by the computation of fluxes and cell-based loops representing updates to state vectors. We present the importance of making efficient use of the underlying vector units in both classes of computational kernels with special emphasis on the changes required for vectorizing face-based loops and their intrinsic indirect...
Physics-based simulation, Computational Fluid Dynamics (CFD) in particular, has substantially reshap...
Abstract In unstructured finite volume method, loop on different mesh components such as cells, face...
This paper highlights a three-year project by an interdisciplinary team on a legacy F77 computationa...
This paper presents a number of optimisations for improving the performance of unstructured computat...
AbstractModern multicore and manycore processors exhibit multiple levels of parallelism through a wi...
Modern multicore and manycore processors exhibit multiple levels of parallelism through a wide range...
unstructured mesh CFD Abstract. In this paper, we present optimization techniques that are crucial t...
This thesis presents a number of optimisations used for mapping the underlying computational pattern...
This article demonstrates the utility and implementation of software prefetching in an unstructured ...
This article demonstrates the utility and implementation of software prefetching in an unstructured ...
Abstract. Graphical Processing Units (GPUs) have shown acceleration factors over multicores for stru...
Prize winning PETSc-FUN3D aerodynamics code, extending it with highly-tuned shared-memory paralleliz...
Achieving optimal performance on the latest multi-core and many-core architectures depends more and ...
Achieving optimal performance on the latest multi-core and many-core architectures increasingly depe...
This thesis documents the analysis and optimization of a high-order finite difference computational ...
Physics-based simulation, Computational Fluid Dynamics (CFD) in particular, has substantially reshap...
Abstract In unstructured finite volume method, loop on different mesh components such as cells, face...
This paper highlights a three-year project by an interdisciplinary team on a legacy F77 computationa...
This paper presents a number of optimisations for improving the performance of unstructured computat...
AbstractModern multicore and manycore processors exhibit multiple levels of parallelism through a wi...
Modern multicore and manycore processors exhibit multiple levels of parallelism through a wide range...
unstructured mesh CFD Abstract. In this paper, we present optimization techniques that are crucial t...
This thesis presents a number of optimisations used for mapping the underlying computational pattern...
This article demonstrates the utility and implementation of software prefetching in an unstructured ...
This article demonstrates the utility and implementation of software prefetching in an unstructured ...
Abstract. Graphical Processing Units (GPUs) have shown acceleration factors over multicores for stru...
Prize winning PETSc-FUN3D aerodynamics code, extending it with highly-tuned shared-memory paralleliz...
Achieving optimal performance on the latest multi-core and many-core architectures depends more and ...
Achieving optimal performance on the latest multi-core and many-core architectures increasingly depe...
This thesis documents the analysis and optimization of a high-order finite difference computational ...
Physics-based simulation, Computational Fluid Dynamics (CFD) in particular, has substantially reshap...
Abstract In unstructured finite volume method, loop on different mesh components such as cells, face...
This paper highlights a three-year project by an interdisciplinary team on a legacy F77 computationa...