At the heart of deep learning training and inferencing are computationally intensive primitives such as convolutions which form the building blocks of deep neural networks. Researchers have taken two distinct approaches to creating high performance implementations of deep learning kernels, namely, 1) library development exemplified by Intel MKLDNN for CPUs, 2) automatic compilation represented by the TensorFlow XLA compiler. The two approaches have their drawbacks: even though a custom built library can deliver very good performance, the cost and time of development of the library can be high. Additionally, hand coding of a plethora of operators for performance is not scalable over the long term as more and more deep learning operators get ...
Modern compilers offer more and more capabilities to automatically parallelize code-regions if these...
Recent advances in Deep Learning (DL) research have been adopted in a wide variety of applications, ...
Deep learning algorithms have seen success in a wide variety of applications, such as machine transl...
Deep Neural Networks (DNNs) have revolutionized many aspects of our lives. The use of DNNs is becomi...
Deep Neural Networks (DNN) are well understood to be one of the largest consumers of HPC resources a...
Thesis (Ph.D.)--University of Washington, 2022As the scaling and performance demands for deep learni...
International audienceA wide range of scientific and machine learning applications depend on highly ...
Optimizing the implementation of tensor computations is essential to exploiting the full capacity of...
We present a library that provides optimized implementations for deep learning primitives. Deep lear...
In this article, a new method is provided for accelerating the execution of convolution layers in De...
Thesis (Ph.D.)--University of Washington, 2021Seamless gains in performance from technology scaling ...
International audienceDeep learning frameworks automate the deployment, distribution, synchronizatio...
The recent “Cambrian explosion” of Deep Learning (DL) algorithms in concert with the end of Moore’s ...
This paper introduces TIRAMISU, a polyhedral framework designed to generate high performance code fo...
In recent years, machine learning has very much been a prominent talking point, and is considered by...
Modern compilers offer more and more capabilities to automatically parallelize code-regions if these...
Recent advances in Deep Learning (DL) research have been adopted in a wide variety of applications, ...
Deep learning algorithms have seen success in a wide variety of applications, such as machine transl...
Deep Neural Networks (DNNs) have revolutionized many aspects of our lives. The use of DNNs is becomi...
Deep Neural Networks (DNN) are well understood to be one of the largest consumers of HPC resources a...
Thesis (Ph.D.)--University of Washington, 2022As the scaling and performance demands for deep learni...
International audienceA wide range of scientific and machine learning applications depend on highly ...
Optimizing the implementation of tensor computations is essential to exploiting the full capacity of...
We present a library that provides optimized implementations for deep learning primitives. Deep lear...
In this article, a new method is provided for accelerating the execution of convolution layers in De...
Thesis (Ph.D.)--University of Washington, 2021Seamless gains in performance from technology scaling ...
International audienceDeep learning frameworks automate the deployment, distribution, synchronizatio...
The recent “Cambrian explosion” of Deep Learning (DL) algorithms in concert with the end of Moore’s ...
This paper introduces TIRAMISU, a polyhedral framework designed to generate high performance code fo...
In recent years, machine learning has very much been a prominent talking point, and is considered by...
Modern compilers offer more and more capabilities to automatically parallelize code-regions if these...
Recent advances in Deep Learning (DL) research have been adopted in a wide variety of applications, ...
Deep learning algorithms have seen success in a wide variety of applications, such as machine transl...