Sparse Transformers have surpassed Graph Neural Networks (GNNs) as the state-of-the-art architecture for multi-hop question answering (MHQA). Noting that the Transformer is a particular message passing GNN, in this paper we perform an architectural analysis and evaluation to investigate why the Transformer outperforms other GNNs on MHQA. We simplify existing GNN-based MHQA models and leverage this system to compare GNN architectures in a lower compute setting than token-level models. Our results support the superiority of the Transformer architecture as a GNN in MHQA. We also investigate the role of graph sparsity, graph structure, and edge features in our GNNs. We find that task-specific graph structuring rules outperform the random conne...
Existing Graph Neural Networks (GNNs) follow the message-passing mechanism that conducts information...
Large transformer models can highly improve Answer Sentence Selection (AS2) tasks, but their high co...
We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-at...
In this paper, we present a two stage model for multi-hop question answering. The first stage is a h...
Transformer-based models achieve great performance on Visual Question Answering (VQA). How- ever, wh...
Efficient Transformers have been developed for long sequence modeling, due to their subquadratic mem...
Transformer models have achieved promising results on natural language processing (NLP) tasks includ...
The Transformer architecture has achieved remarkable success in a number of domains including natura...
National audienceNeural architectures based on self-attention, such as Transformers, recently attrac...
In this paper we provide, to the best of our knowledge, the first comprehensive approach for incorpo...
Graph Neural Networks and Transformers are very powerful frameworks for learning machine learning ta...
Multi-head attention is a driving force behind state-of-the-art transformers, which achieve remarkab...
Transformer-based models have recently shown success in representation learning on graph-structured ...
Despite progress across a broad range of applications, Transformers have limited success in systemat...
Graph Neural Networks (GNNs) tend to suffer from high computation costs due to the exponentially inc...
Existing Graph Neural Networks (GNNs) follow the message-passing mechanism that conducts information...
Large transformer models can highly improve Answer Sentence Selection (AS2) tasks, but their high co...
We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-at...
In this paper, we present a two stage model for multi-hop question answering. The first stage is a h...
Transformer-based models achieve great performance on Visual Question Answering (VQA). How- ever, wh...
Efficient Transformers have been developed for long sequence modeling, due to their subquadratic mem...
Transformer models have achieved promising results on natural language processing (NLP) tasks includ...
The Transformer architecture has achieved remarkable success in a number of domains including natura...
National audienceNeural architectures based on self-attention, such as Transformers, recently attrac...
In this paper we provide, to the best of our knowledge, the first comprehensive approach for incorpo...
Graph Neural Networks and Transformers are very powerful frameworks for learning machine learning ta...
Multi-head attention is a driving force behind state-of-the-art transformers, which achieve remarkab...
Transformer-based models have recently shown success in representation learning on graph-structured ...
Despite progress across a broad range of applications, Transformers have limited success in systemat...
Graph Neural Networks (GNNs) tend to suffer from high computation costs due to the exponentially inc...
Existing Graph Neural Networks (GNNs) follow the message-passing mechanism that conducts information...
Large transformer models can highly improve Answer Sentence Selection (AS2) tasks, but their high co...
We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-at...