Transformers have been established as one of the most effective neural approach in performing various Natural Language Processing tasks. However, following common trend in modern deep architectures, their scale has quickly grown to an extent that reduces the concrete possibility for several enterprises to train such models from scratch. Indeed, despite their high-level performances, Transformers have the general drawback of requiring a huge amount of training data, computational resources and energy consumption to be successfully optimized. For this reason, more recent architectures like Bidirectional Encoder Representations from Transformers rely on unlabeled data to pre-train the model, which is later fine-tuned for a specific downstream ...
Introducing factors, that is to say, word features such as linguistic information referring to the s...
Transformers are powerful for sequence modeling. Nearly all state-of-the-art language models and pre...
Given a large Transformer model, how can we obtain a small and computationally efficient model which...
Transformers have been established as one of the most effective neural approach in performing variou...
Natural language processing (NLP) techniques had significantly improved by introducing pre-trained l...
Introducing factors such as linguistic features has long been proposed in machine translation to imp...
The utility of linguistic annotation in neural machine translation seemed to had been established in...
This chapter presents an overview of the state of the art in natural language processing, exploring ...
The Transformer model is a very recent, fast and powerful discovery in neural machine translation. W...
Language Generation Models produce words based on the previous context. Although existing methods of...
Data augmentation methods for Natural Language Processing tasks are explored in recent years, howeve...
End-to-end neural machine translation does not require us to have specialized knowledge of investiga...
In the last decade, the size of deep neural architectures implied in Natural Language Processing (NL...
The goal of my thesis is to investigate the most influential transformer architectures and to apply ...
We propose a novel Transformer encoder-based architecture with syntactical knowledge encoded for int...
Introducing factors, that is to say, word features such as linguistic information referring to the s...
Transformers are powerful for sequence modeling. Nearly all state-of-the-art language models and pre...
Given a large Transformer model, how can we obtain a small and computationally efficient model which...
Transformers have been established as one of the most effective neural approach in performing variou...
Natural language processing (NLP) techniques had significantly improved by introducing pre-trained l...
Introducing factors such as linguistic features has long been proposed in machine translation to imp...
The utility of linguistic annotation in neural machine translation seemed to had been established in...
This chapter presents an overview of the state of the art in natural language processing, exploring ...
The Transformer model is a very recent, fast and powerful discovery in neural machine translation. W...
Language Generation Models produce words based on the previous context. Although existing methods of...
Data augmentation methods for Natural Language Processing tasks are explored in recent years, howeve...
End-to-end neural machine translation does not require us to have specialized knowledge of investiga...
In the last decade, the size of deep neural architectures implied in Natural Language Processing (NL...
The goal of my thesis is to investigate the most influential transformer architectures and to apply ...
We propose a novel Transformer encoder-based architecture with syntactical knowledge encoded for int...
Introducing factors, that is to say, word features such as linguistic information referring to the s...
Transformers are powerful for sequence modeling. Nearly all state-of-the-art language models and pre...
Given a large Transformer model, how can we obtain a small and computationally efficient model which...