International audienceIn this paper, we present an approach to estimate GPU applications' performance upper bound based on algorithm analysis and assembly code level benchmarking. As an example, we analyze the potential peak performance of SGEMM (Single-precision General Matrix Multiply) on Fermi (GF110) and Kepler (GK104) GPUs. We try to answer the question of how much optimization space is left for SGEMM and why. According to our analysis, the nature of Fermi (Kepler) instruction set and the limited issue throughput of the schedulers are the main limitation factors for SGEMM to approach the theoretical peak performance. The estimated upper-bound peak performance of SGEMM is around 82.5% of the theoretical peak performance on GTX580 Fermi ...
We develop a microbenchmark-based performance model for NVIDIA GeForce 200-series GPUs. Our model id...
Data analyze has become very important with growth of information today. There is a need of real-tim...
Abstract—This paper presents a performance modeling and optimization analysis tool to predict and op...
International audienceIn this paper, we present an approach to estimate GPU applications' performanc...
In this paper, we studied the NVIDIA GPU architecture characteristics concerning the SGEMM routine a...
This thesis work is funded by the ANR PetaQCD project. We have mainly worked on two topics of GPU pe...
In this paper we discuss about our experiences in improving the performance of GEMM (both single and...
Commodity clusters augmented with application accelerators are evolving as competitive high performa...
In this thesis work, we have mainly worked on two topics of GPU performance analysis. First, we hav...
AbstractThis paper presents results of our study on double-precision general matrix-matrix multiplic...
In this paper we discuss about our experiences in improving the performance of two key algorithms: t...
High performance Computing is increasingly being done on parallel machines like GPUs. In my work, I ...
Thesis (M.S.)--Wichita State University, College of Engineering, Dept. of Electrical Engineering and...
AbstractWe optimized Moving Particle Simulation (MPS) method for Kepler GPU. Solving sparse matrix o...
We develop a microbenchmark-based performance model for NVIDIA GeForce 200-series GPUs. Our model id...
Data analyze has become very important with growth of information today. There is a need of real-tim...
Abstract—This paper presents a performance modeling and optimization analysis tool to predict and op...
International audienceIn this paper, we present an approach to estimate GPU applications' performanc...
In this paper, we studied the NVIDIA GPU architecture characteristics concerning the SGEMM routine a...
This thesis work is funded by the ANR PetaQCD project. We have mainly worked on two topics of GPU pe...
In this paper we discuss about our experiences in improving the performance of GEMM (both single and...
Commodity clusters augmented with application accelerators are evolving as competitive high performa...
In this thesis work, we have mainly worked on two topics of GPU performance analysis. First, we hav...
AbstractThis paper presents results of our study on double-precision general matrix-matrix multiplic...
In this paper we discuss about our experiences in improving the performance of two key algorithms: t...
High performance Computing is increasingly being done on parallel machines like GPUs. In my work, I ...
Thesis (M.S.)--Wichita State University, College of Engineering, Dept. of Electrical Engineering and...
AbstractWe optimized Moving Particle Simulation (MPS) method for Kepler GPU. Solving sparse matrix o...
We develop a microbenchmark-based performance model for NVIDIA GeForce 200-series GPUs. Our model id...
Data analyze has become very important with growth of information today. There is a need of real-tim...
Abstract—This paper presents a performance modeling and optimization analysis tool to predict and op...