In this paper, we studied the NVIDIA GPU architecture characteristics concerning the SGEMM routine and the potential peak performance of SGEMM on Fermi GPU. Guiding by the analysis, our SGEMM routine achieved about 11% (NN), 4.5% (TN), 3% (NT) and 9% (TT) better performance than cublas in CUDA 4.1 package for large matrices on GTX580 Fermi Card. We also described how to use native assembly language directly in the CUDA runtime source code
In the last three years, GPUs are more and more being used for general purpose applications instead ...
Graphics processor units (GPUs) today can be used for computations that go beyond graphics and such...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...
In this paper, we studied the NVIDIA GPU architecture characteristics concerning the SGEMM routine a...
International audienceIn this paper, we present an approach to estimate GPU applications' performanc...
This thesis work is funded by the ANR PetaQCD project. We have mainly worked on two topics of GPU pe...
In this paper we discuss about our experiences in improving the performance of GEMM (both single and...
Thesis (M.S.)--Wichita State University, College of Engineering, Dept. of Electrical Engineering and...
This paper analyzes several aspects regarding the improvement of software performance for applicatio...
Abstract — GPU based on CUDA Architecture developed by NVIDIA is a high performance computing device...
Computing on graphics processors is maybe one of the most important developments in computational sc...
In this paper we discuss about our experiences in improving the performance of two key algorithms: t...
General matrix-matrix multiplications (GEMM) in vendor-supplied BLAS libraries are best optimized fo...
High performance Computing is increasingly being done on parallel machines like GPUs. In my work, I ...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...
In the last three years, GPUs are more and more being used for general purpose applications instead ...
Graphics processor units (GPUs) today can be used for computations that go beyond graphics and such...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...
In this paper, we studied the NVIDIA GPU architecture characteristics concerning the SGEMM routine a...
International audienceIn this paper, we present an approach to estimate GPU applications' performanc...
This thesis work is funded by the ANR PetaQCD project. We have mainly worked on two topics of GPU pe...
In this paper we discuss about our experiences in improving the performance of GEMM (both single and...
Thesis (M.S.)--Wichita State University, College of Engineering, Dept. of Electrical Engineering and...
This paper analyzes several aspects regarding the improvement of software performance for applicatio...
Abstract — GPU based on CUDA Architecture developed by NVIDIA is a high performance computing device...
Computing on graphics processors is maybe one of the most important developments in computational sc...
In this paper we discuss about our experiences in improving the performance of two key algorithms: t...
General matrix-matrix multiplications (GEMM) in vendor-supplied BLAS libraries are best optimized fo...
High performance Computing is increasingly being done on parallel machines like GPUs. In my work, I ...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...
In the last three years, GPUs are more and more being used for general purpose applications instead ...
Graphics processor units (GPUs) today can be used for computations that go beyond graphics and such...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...