Lessons learned from contrasting a BLAS kernel implementations

More, Andres

Publication date

October 2013

Language

English

Abstract

This work reviews the experience of implementing different versions of the SSPR rank-one update operation of the BLAS library. The main objective was to contrast CPU versus GPU implementation effort and complexity of an optimized BLAS routine, not considering performance. This work contributes with a sample procedure to compare BLAS kernel implementations, how to start using GPU libraries and offloading, how to analyze their performance and the issues faced and how they were solved.WPDP- XIII Workshop procesamiento distribuido y paraleloRed de Universidades con Carreras en Informática (RedUNCI

Extracted data

We use cookies to provide a better user experience.

Data Protection

Lessons learned from contrasting a BLAS kernel implementations

Abstract

Extracted data

Lessons learned from contrasting a BLAS kernel implementations

Abstract

Extracted data

Related items

Related items