The effectiveness of Machine Learning (ML) methods depend on access to large suitable datasets. In this article, we present how we build the LS-CAT (Large-Scale CUDA AutoTuning) dataset sourced from GitHub for the purpose of training NLP-based ML models. Our dataset includes 19 683 CUDA kernels focused on linear algebra. In addition to the CUDA codes, our LS-CAT dataset contains 5 028 536 associated runtimes, with different combinations of kernels, block sizes and matrix sizes. The runtime are GPU benchmarks on both Nvidia GTX 980 and Nvidia T4 systems. This information creates a foundation upon which NLP-based models can find correlations between source-code features and optimal choice of thread block sizes. There are several results that ...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
We present an implementation of genetic algorithm (GA) training of feedforward artificial neural net...
We devise a performance model for GPU training of Deep Learning Recommendation Models (DLRM), whose ...
The abstract relation between hardware parameters and program performance makes setting program para...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
Autotuning oppgaver er nesten umulige for mennesker å gjennomføre. Den abstrakte relasjonen mellom m...
We present the design and implementation of GLDA, a library that utilizes the GPU (Graphics Processi...
There are many successful applications to take advantages of massive parallelization on GPU for deep...
We have developed several autotuning benchmarks in CUDA that take into account performance-relevant ...
Computers are powerful tools which perform fast, accurate calculations over huge sets of data. Howev...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
NVIDIA have released a new platform (CUDA) for general purpose computing on their graphical processi...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
Large scale machine learning has many characteristics that can be exploited in the system designs to...
2012-05-02Graphics Processing Units (GPUs) have evolved to devices with teraflop-level performance p...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
We present an implementation of genetic algorithm (GA) training of feedforward artificial neural net...
We devise a performance model for GPU training of Deep Learning Recommendation Models (DLRM), whose ...
The abstract relation between hardware parameters and program performance makes setting program para...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
Autotuning oppgaver er nesten umulige for mennesker å gjennomføre. Den abstrakte relasjonen mellom m...
We present the design and implementation of GLDA, a library that utilizes the GPU (Graphics Processi...
There are many successful applications to take advantages of massive parallelization on GPU for deep...
We have developed several autotuning benchmarks in CUDA that take into account performance-relevant ...
Computers are powerful tools which perform fast, accurate calculations over huge sets of data. Howev...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
NVIDIA have released a new platform (CUDA) for general purpose computing on their graphical processi...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
Large scale machine learning has many characteristics that can be exploited in the system designs to...
2012-05-02Graphics Processing Units (GPUs) have evolved to devices with teraflop-level performance p...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
We present an implementation of genetic algorithm (GA) training of feedforward artificial neural net...
We devise a performance model for GPU training of Deep Learning Recommendation Models (DLRM), whose ...