In this paper we discuss about our experiences in improving the performance of GEMM (both single and double precision) on Fermi architecture using CUDA, and how the new features of Fermi such as cache affect performance. It is found that the addition of cache in GPU on one hand helps the processers take advantage of data locality occurred in runtime but on the other hand renders the dependency of performance on algorithmic parameters less predictable. Auto tuning then becomes a useful technique to address this issue. Our auto-tuned SGEMM and DGEMM reach 563 GFlops and 253 GFlops respectively on Tesla C2050. The design and implementation entirely use CUDA and C and have not benefited from tuning at the level of binary code. ? 2010 IEEE.EI
High performance Computing is increasingly being done on parallel machines like GPUs. In my work, I ...
GPUs have become popular due to their high computational power. Data scientists rely on GPUs to proc...
General purpose computing on graphics processing units (GPGPU) is fast becoming a common feature of ...
The development of high performance dense linear algebra (DLA) critically depends on highly optimize...
AbstractThis paper presents results of our study on double-precision general matrix-matrix multiplic...
In this paper, we studied the NVIDIA GPU architecture characteristics concerning the SGEMM routine a...
International audienceIn this paper, we present an approach to estimate GPU applications' performanc...
Thesis (M.S.)--Wichita State University, College of Engineering, Dept. of Electrical Engineering and...
In order to utilize the tremendous computing power of grpahics hardware and to automatically adapt t...
Graphics hardware’s performance is advancing much faster than the performance of conventional microp...
Sparse matrix-vector multiplication is an integral part of many scientific algorithms. Several studi...
Graphics hardware's performance is advancing much faster than the performance of conventional microp...
The unprecedented prevalence of GPGPU is largely attributed to its abundant on-chip register resourc...
In this article, we discuss the performance modeling and optimization of Sparse Matrix-Vector Multip...
In this paper we discuss about our experiences in improving the performance of two key algorithms: t...
High performance Computing is increasingly being done on parallel machines like GPUs. In my work, I ...
GPUs have become popular due to their high computational power. Data scientists rely on GPUs to proc...
General purpose computing on graphics processing units (GPGPU) is fast becoming a common feature of ...
The development of high performance dense linear algebra (DLA) critically depends on highly optimize...
AbstractThis paper presents results of our study on double-precision general matrix-matrix multiplic...
In this paper, we studied the NVIDIA GPU architecture characteristics concerning the SGEMM routine a...
International audienceIn this paper, we present an approach to estimate GPU applications' performanc...
Thesis (M.S.)--Wichita State University, College of Engineering, Dept. of Electrical Engineering and...
In order to utilize the tremendous computing power of grpahics hardware and to automatically adapt t...
Graphics hardware’s performance is advancing much faster than the performance of conventional microp...
Sparse matrix-vector multiplication is an integral part of many scientific algorithms. Several studi...
Graphics hardware's performance is advancing much faster than the performance of conventional microp...
The unprecedented prevalence of GPGPU is largely attributed to its abundant on-chip register resourc...
In this article, we discuss the performance modeling and optimization of Sparse Matrix-Vector Multip...
In this paper we discuss about our experiences in improving the performance of two key algorithms: t...
High performance Computing is increasingly being done on parallel machines like GPUs. In my work, I ...
GPUs have become popular due to their high computational power. Data scientists rely on GPUs to proc...
General purpose computing on graphics processing units (GPGPU) is fast becoming a common feature of ...