Deep Convolutional Neural Networks have found wide application but their training time can be significant. We find that between successive epochs during training, many neurons compute nearly the same output when presented with the same input. This presents an opportunity to skip computation in the forward pass on the later epoch via memoization. This dissertation explores the potential of such an approach by investigating the correlation of neuron activations between training epochs. We develop an implementation of activation memoization that takes into account the lockstep behavior of threads executing together in single-instruction, multiple-thread Graphic Processing Units (GPU). Finally, we discuss the trade-off between speedup ...
To break the three lockings during backpropagation (BP) process for neural network training, multipl...
The final publication is available at ACM via http://dx.doi.org/10.1145/3352460.3358309Recurrent Neu...
Parallel hardware accelerators, for example Graphics Processor Units, have limited on-chip memory ca...
Deep learning is a branch of machine learning that aims to extract multiple simple features from da...
Deep learning is an important component of big-data analytic tools and intelligent applications, suc...
I present a new way to parallelize the training of convolutional neural networks across multiple GPU...
Deep convolutional neural networks (ConvNets), which are at the heart of many new emerging applicati...
When a Convolutional Neural Network is used for on-the-fly evaluation of continuously updating time...
The focus of this paper is speeding up the evaluation of convolutional neural networks. While delive...
Deep convolutional neural networks (ConvNets), which are at the heart of many new emerging applicati...
Supervised learning of Convolutional Neural Networks (CNNs), also known as supervised Deep Learning,...
DNNs have been finding a growing number of applications including image classification, speech recog...
The convolutional neural networks (CNNs) have proven to be powerful classification tools in tasks th...
Nowadays, artificial neural networks (ANNs) can outperform the human brain ability in specific tasks...
Deep neural network models are commonly used in various real-life applications due to their high pre...
To break the three lockings during backpropagation (BP) process for neural network training, multipl...
The final publication is available at ACM via http://dx.doi.org/10.1145/3352460.3358309Recurrent Neu...
Parallel hardware accelerators, for example Graphics Processor Units, have limited on-chip memory ca...
Deep learning is a branch of machine learning that aims to extract multiple simple features from da...
Deep learning is an important component of big-data analytic tools and intelligent applications, suc...
I present a new way to parallelize the training of convolutional neural networks across multiple GPU...
Deep convolutional neural networks (ConvNets), which are at the heart of many new emerging applicati...
When a Convolutional Neural Network is used for on-the-fly evaluation of continuously updating time...
The focus of this paper is speeding up the evaluation of convolutional neural networks. While delive...
Deep convolutional neural networks (ConvNets), which are at the heart of many new emerging applicati...
Supervised learning of Convolutional Neural Networks (CNNs), also known as supervised Deep Learning,...
DNNs have been finding a growing number of applications including image classification, speech recog...
The convolutional neural networks (CNNs) have proven to be powerful classification tools in tasks th...
Nowadays, artificial neural networks (ANNs) can outperform the human brain ability in specific tasks...
Deep neural network models are commonly used in various real-life applications due to their high pre...
To break the three lockings during backpropagation (BP) process for neural network training, multipl...
The final publication is available at ACM via http://dx.doi.org/10.1145/3352460.3358309Recurrent Neu...
Parallel hardware accelerators, for example Graphics Processor Units, have limited on-chip memory ca...