Stacked self-attention models receive widespread attention, due to its ability of capturing global dependency among words. However, the stacking of many layers and components generates huge parameters, leading to low parameter efficiency. In response to this issue, we propose a lightweight architecture named Continuous Self-Attention models with neural ODE networks (CSAODE). In CSAODE, continuous dynamical models (i.e., neural ODEs) are coupled with our proposed self-attention block to form a self-attention ODE solver. This solver continuously calculates and optimizes the hidden states via only one layer of parameters to improve the parameter efficiency. In addition, we design a novel accelerated continuous dynamical model to reduce compu...
Large pretrained language models using the transformer neural network architecture are becoming a do...
University of Technology Sydney. Faculty of Engineering and Information Technology.This research stu...
© Learning Representations, ICLR 2018 - Conference Track Proceedings.All right reserved. Recurrent n...
© 2021 IEEE.The self-attention mechanism is rapidly emerging as one of the most important key primit...
Although deep neural networks generally have fixed network structures, the concept of dynamic mechan...
Many natural language processing tasks solely rely on sparse dependencies between a few tokens in a ...
Neural network models with attention mechanism have shown their efficiencies on various tasks. Howev...
In this paper, we aim to enhance self-attention (SA) mechanism for deep metric learning in visual pe...
Attention mechanism is crucial for sequential learning where a wide range of applications have been ...
The study of specialized accelerators tailored for neural networks is becoming a promising topic in ...
Self-attention-based networks have obtained impressive performance in parallel training and global c...
Many machine learning tasks are structured as sequence modeling problems, predominantly dealing with...
Self-attention model has shown its flexibility in parallel computation and the effectiveness on mode...
Neural ordinary differential equations (ODEs) have attracted much attention as continuous-time count...
Considering the spectral properties of images, we propose a new self-attention mechanism with highly...
Large pretrained language models using the transformer neural network architecture are becoming a do...
University of Technology Sydney. Faculty of Engineering and Information Technology.This research stu...
© Learning Representations, ICLR 2018 - Conference Track Proceedings.All right reserved. Recurrent n...
© 2021 IEEE.The self-attention mechanism is rapidly emerging as one of the most important key primit...
Although deep neural networks generally have fixed network structures, the concept of dynamic mechan...
Many natural language processing tasks solely rely on sparse dependencies between a few tokens in a ...
Neural network models with attention mechanism have shown their efficiencies on various tasks. Howev...
In this paper, we aim to enhance self-attention (SA) mechanism for deep metric learning in visual pe...
Attention mechanism is crucial for sequential learning where a wide range of applications have been ...
The study of specialized accelerators tailored for neural networks is becoming a promising topic in ...
Self-attention-based networks have obtained impressive performance in parallel training and global c...
Many machine learning tasks are structured as sequence modeling problems, predominantly dealing with...
Self-attention model has shown its flexibility in parallel computation and the effectiveness on mode...
Neural ordinary differential equations (ODEs) have attracted much attention as continuous-time count...
Considering the spectral properties of images, we propose a new self-attention mechanism with highly...
Large pretrained language models using the transformer neural network architecture are becoming a do...
University of Technology Sydney. Faculty of Engineering and Information Technology.This research stu...
© Learning Representations, ICLR 2018 - Conference Track Proceedings.All right reserved. Recurrent n...