Self-attention mechanisms have recently caused many concerns on Natural Language Processing (NLP) tasks. Relative positional information is important to self-attention mechanisms. We propose Faraway Mask focusing on the (2m + 1)-gram words and Scaled-Distance Mask putting the logarithmic distance punishment to avoid and weaken the self-attention of distant words respectively. To exploit different masks, we present Positional Self-Attention Layer for generating different Masked-Self-Attentions and a following Position-Fusion Layer in which fused positional information multiplies the Masked-Self-Attentions for generating sentence embeddings. To evaluate our sentence embeddings approach Multiple Positional Self-Attention Network (MPSAN), we pe...
Linguistic steganalysis can indicate the existence of steganographic content in suspicious text carr...
Self-attention-based networks have obtained impressive performance in parallel training and global c...
Large pretrained language models using the transformer neural network architecture are becoming a do...
University of Technology Sydney. Faculty of Engineering and Information Technology.This research stu...
Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widely used on NLP tasks to capt...
Copyright © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rig...
Self-attention model has shown its flexibility in parallel computation and the effectiveness on mode...
Many natural language processing tasks solely rely on sparse dependencies between a few tokens in a ...
Document classification has a broad application in the field of sentiment classification, document r...
Although deep neural networks generally have fixed network structures, the concept of dynamic mechan...
In this paper, we aim to enhance self-attention (SA) mechanism for deep metric learning in visual pe...
Neural network models with attention mechanism have shown their efficiencies on various tasks. Howev...
Considering the spectral properties of images, we propose a new self-attention mechanism with highly...
Self-attention networks (SAN) have shown promising performance in various Natural Language Process...
In this paper, we introduce the prior knowledge, multi-scale structure, into self-attention modules....
Linguistic steganalysis can indicate the existence of steganographic content in suspicious text carr...
Self-attention-based networks have obtained impressive performance in parallel training and global c...
Large pretrained language models using the transformer neural network architecture are becoming a do...
University of Technology Sydney. Faculty of Engineering and Information Technology.This research stu...
Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widely used on NLP tasks to capt...
Copyright © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rig...
Self-attention model has shown its flexibility in parallel computation and the effectiveness on mode...
Many natural language processing tasks solely rely on sparse dependencies between a few tokens in a ...
Document classification has a broad application in the field of sentiment classification, document r...
Although deep neural networks generally have fixed network structures, the concept of dynamic mechan...
In this paper, we aim to enhance self-attention (SA) mechanism for deep metric learning in visual pe...
Neural network models with attention mechanism have shown their efficiencies on various tasks. Howev...
Considering the spectral properties of images, we propose a new self-attention mechanism with highly...
Self-attention networks (SAN) have shown promising performance in various Natural Language Process...
In this paper, we introduce the prior knowledge, multi-scale structure, into self-attention modules....
Linguistic steganalysis can indicate the existence of steganographic content in suspicious text carr...
Self-attention-based networks have obtained impressive performance in parallel training and global c...
Large pretrained language models using the transformer neural network architecture are becoming a do...