Transformer networks have emerged as the state-of-the-art approach for natural language processing tasks and are gaining popularity in other domains such as computer vision and audio processing. However, the efficient hardware acceleration of transformer models poses new challenges due to their high arithmetic intensities, large memory requirements, and complex dataflow dependencies. In this work, we propose ITA, a novel accelerator architecture for transformers and related models that targets efficient inference on embedded systems by exploiting 8-bit quantization and an innovative softmax implementation that operates exclusively on integer values. By computing on-the-fly in streaming mode, our softmax implementation minimizes data movemen...
The large attention-based encoder-decoder network (Transformer) has become prevailing recently due t...
Over the last ten years, the rise of deep learning has redefined the state-of-the-art in many comput...
Modern deep neural network (DNN) applications demand a remarkable processing throughput usually unme...
The topic of transformers is rapidly emerging as one of the most important key primitives in neural ...
With the recent developments in the field of Natural Language Processing, there has been a rise in t...
With the recent developments in the field of Natural Language Processing, there has been a rise in t...
Transformer networks have become state-of-The-Art for many tasks such as NLP and are closing the gap...
There has been a rapid development of custom accelerators to speed up the training and inference of ...
Transformer models have achieved impressive results in various AI scenarios, ranging from vision to ...
The attention mechanism is the key to many state-of-the-art transformer-based models in Natural Lang...
The AI boom is bringing a plethora of domain-specific architectures for Neural Network computations....
Transformer-based neural models are used in many AI applications. Training these models is expensive...
The Transformer architecture is ubiquitously used as the building block of large-scale autoregressiv...
Given a large Transformer model, how can we obtain a small and computationally efficient model which...
Given a large Transformer model, how can we obtain a small and computationally efficient model which...
The large attention-based encoder-decoder network (Transformer) has become prevailing recently due t...
Over the last ten years, the rise of deep learning has redefined the state-of-the-art in many comput...
Modern deep neural network (DNN) applications demand a remarkable processing throughput usually unme...
The topic of transformers is rapidly emerging as one of the most important key primitives in neural ...
With the recent developments in the field of Natural Language Processing, there has been a rise in t...
With the recent developments in the field of Natural Language Processing, there has been a rise in t...
Transformer networks have become state-of-The-Art for many tasks such as NLP and are closing the gap...
There has been a rapid development of custom accelerators to speed up the training and inference of ...
Transformer models have achieved impressive results in various AI scenarios, ranging from vision to ...
The attention mechanism is the key to many state-of-the-art transformer-based models in Natural Lang...
The AI boom is bringing a plethora of domain-specific architectures for Neural Network computations....
Transformer-based neural models are used in many AI applications. Training these models is expensive...
The Transformer architecture is ubiquitously used as the building block of large-scale autoregressiv...
Given a large Transformer model, how can we obtain a small and computationally efficient model which...
Given a large Transformer model, how can we obtain a small and computationally efficient model which...
The large attention-based encoder-decoder network (Transformer) has become prevailing recently due t...
Over the last ten years, the rise of deep learning has redefined the state-of-the-art in many comput...
Modern deep neural network (DNN) applications demand a remarkable processing throughput usually unme...