In this paper, we studied the NVIDIA GPU architecture characteristics concerning the SGEMM routine and the potential peak performance of SGEMM on Fermi GPU. Guiding by the analysis, our SGEMM routine achieved about 11% (NN), 4.5% (TN), 3% (NT) and 9% (TT) better performance than cublas in CUDA 4.1 package for large matrices on GTX580 Fermi Card. We also described how to use native assembly language directly in the CUDA runtime source code
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...
International audienceWe show that most performance improvements in GPUs increase the number of exec...
AbstractThis paper presents results of our study on double-precision general matrix-matrix multiplic...
In this paper, we studied the NVIDIA GPU architecture characteristics concerning the SGEMM routine a...
International audienceIn this paper, we present an approach to estimate GPU applications' performanc...
This thesis work is funded by the ANR PetaQCD project. We have mainly worked on two topics of GPU pe...
In this paper we discuss about our experiences in improving the performance of GEMM (both single and...
This paper analyzes several aspects regarding the improvement of software performance for applicatio...
Computing on graphics processors is maybe one of the most important developments in computational sc...
Thesis (M.S.)--Wichita State University, College of Engineering, Dept. of Electrical Engineering and...
General matrix-matrix multiplications (GEMM) in vendor-supplied BLAS libraries are best optimized fo...
In this paper we discuss about our experiences in improving the performance of two key algorithms: t...
High performance Computing is increasingly being done on parallel machines like GPUs. In my work, I ...
Abstract — GPU based on CUDA Architecture developed by NVIDIA is a high performance computing device...
Commodity clusters augmented with application accelerators are evolving as competitive high performa...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...
International audienceWe show that most performance improvements in GPUs increase the number of exec...
AbstractThis paper presents results of our study on double-precision general matrix-matrix multiplic...
In this paper, we studied the NVIDIA GPU architecture characteristics concerning the SGEMM routine a...
International audienceIn this paper, we present an approach to estimate GPU applications' performanc...
This thesis work is funded by the ANR PetaQCD project. We have mainly worked on two topics of GPU pe...
In this paper we discuss about our experiences in improving the performance of GEMM (both single and...
This paper analyzes several aspects regarding the improvement of software performance for applicatio...
Computing on graphics processors is maybe one of the most important developments in computational sc...
Thesis (M.S.)--Wichita State University, College of Engineering, Dept. of Electrical Engineering and...
General matrix-matrix multiplications (GEMM) in vendor-supplied BLAS libraries are best optimized fo...
In this paper we discuss about our experiences in improving the performance of two key algorithms: t...
High performance Computing is increasingly being done on parallel machines like GPUs. In my work, I ...
Abstract — GPU based on CUDA Architecture developed by NVIDIA is a high performance computing device...
Commodity clusters augmented with application accelerators are evolving as competitive high performa...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...
International audienceWe show that most performance improvements in GPUs increase the number of exec...
AbstractThis paper presents results of our study on double-precision general matrix-matrix multiplic...