In the paper we provide a comparison of several runtimes which can be used for offloading computationally intensive kernels to the Intel Xeon Phi coprocessors. The presented benchmark application is a stripped-down version of an iterative solver used within the Schur complement finite or boundary element tearing and interconnecting (FETI, BETI) domain decomposition methods where the sparse solve with local stiffness matrices is replaced by the multiplication with dense matrices in order to exploit coalesced memory access patterns. We present offload approaches based on the Intel Language Extension for Offload (LEO), Hetero Streams Library (hStreams), and Heterogeneous Active Messages (HAM), and compare their performance and ease of use.Web ...
In this whitepaper, we propose outer-product-parallel and inner-product-parallel sparse matrix-matri...
AbstractObtaining exascale performance is a challenge. Although the technology of today features har...
Abstract—This paper presents preliminary performance com-parisons of parallel applications developed...
This paper describes an approach for acceleration of the Hybrid Total FETI (HTFETI) domain decomposi...
Intel Xeon Phi is a coprocessor with sixty-one cores in a single chip. The chip has a more powerful ...
We investigate a domain decomposition method (DDM) of finite element method (FEM) using Intel's...
Abstract. Intel Xeon Phi is a recently released high-performance co-processor which features 61 core...
In this paper we report our experiences in porting the FEASTFLOW software infrastructure to the Inte...
In this paper we describe different applications we have ported to Intel Xeon Phi architectures, ana...
The paper describes an efficient direct method to solve an equation Ax = b, where A is a sparse matr...
In this paper, we propose a lightweight optimization methodology for the ubiquitous sparse matrix-ve...
In the paper we study the performance of the regularized boundary element quadrature routines implem...
This work describes the challenges presented by porting parts of the Gysela code to the In...
The Intel R Xeon PhiTM is the first processor based on Intel’s MIC (Many Integrated Cores) architect...
International audienceThis work describes the challenges presented by porting parts of the gysela co...
In this whitepaper, we propose outer-product-parallel and inner-product-parallel sparse matrix-matri...
AbstractObtaining exascale performance is a challenge. Although the technology of today features har...
Abstract—This paper presents preliminary performance com-parisons of parallel applications developed...
This paper describes an approach for acceleration of the Hybrid Total FETI (HTFETI) domain decomposi...
Intel Xeon Phi is a coprocessor with sixty-one cores in a single chip. The chip has a more powerful ...
We investigate a domain decomposition method (DDM) of finite element method (FEM) using Intel's...
Abstract. Intel Xeon Phi is a recently released high-performance co-processor which features 61 core...
In this paper we report our experiences in porting the FEASTFLOW software infrastructure to the Inte...
In this paper we describe different applications we have ported to Intel Xeon Phi architectures, ana...
The paper describes an efficient direct method to solve an equation Ax = b, where A is a sparse matr...
In this paper, we propose a lightweight optimization methodology for the ubiquitous sparse matrix-ve...
In the paper we study the performance of the regularized boundary element quadrature routines implem...
This work describes the challenges presented by porting parts of the Gysela code to the In...
The Intel R Xeon PhiTM is the first processor based on Intel’s MIC (Many Integrated Cores) architect...
International audienceThis work describes the challenges presented by porting parts of the gysela co...
In this whitepaper, we propose outer-product-parallel and inner-product-parallel sparse matrix-matri...
AbstractObtaining exascale performance is a challenge. Although the technology of today features har...
Abstract—This paper presents preliminary performance com-parisons of parallel applications developed...