Recent work suggests that transformer models are capable of multi-task learning on diverse NLP tasks. However, the potential of these models may be limited as they use the same set of parameters for all tasks. In contrast, humans tackle tasks in a more flexible way, by making proper presumptions on what skills and knowledge are relevant and executing only the necessary computations. Inspired by this, we propose to use task-level mixture-of-expert models, which has a collection of transformer layers (i.e., experts) and a router component to choose among these experts dynamically and flexibly. We show that the learned routing decisions and experts partially rediscover human categorization of NLP tasks -- certain experts are strongly associate...
Deep imitation learning requires many expert demonstrations, which can be hard to obtain, especially...
Recent work has shown the promise of creating generalist, transformer-based, models for language, vi...
International audienceIn open-ended continuous environments, robots need to learn multiple parameter...
Mixtures of Experts combine the outputs of several “expert ” networks, each of which specializes in ...
Sparsely-activated Mixture-of-experts (MoE) models allow the number of parameters to greatly increas...
Mixture of Experts (MoE) is a classical architecture for ensembles where each member is specialised...
Learning discriminative task-specific features simultaneously for multiple distinct tasks is a funda...
Neural Machine Translation (NMT) is notorious for its need for large amounts ofbilingual data. An ef...
The mixture-of-experts model is a static neural network architecture in that it learns input-output ...
In this work we present a Mixture of Task-Aware Experts Network for Machine Reading Comprehension on...
Deep learning models for vision tasks are trained on large datasets under the assumption that there ...
Typical multi-task learning (MTL) methods rely on architectural adjustments and a large trainable pa...
Structure prediction (SP) tasks are important in natural language understanding in the sense that th...
In the context of multi-task learning, neural networks with branched architectures have often been e...
The Transformer model is a very recent, fast and powerful discovery in neural machine translation. W...
Deep imitation learning requires many expert demonstrations, which can be hard to obtain, especially...
Recent work has shown the promise of creating generalist, transformer-based, models for language, vi...
International audienceIn open-ended continuous environments, robots need to learn multiple parameter...
Mixtures of Experts combine the outputs of several “expert ” networks, each of which specializes in ...
Sparsely-activated Mixture-of-experts (MoE) models allow the number of parameters to greatly increas...
Mixture of Experts (MoE) is a classical architecture for ensembles where each member is specialised...
Learning discriminative task-specific features simultaneously for multiple distinct tasks is a funda...
Neural Machine Translation (NMT) is notorious for its need for large amounts ofbilingual data. An ef...
The mixture-of-experts model is a static neural network architecture in that it learns input-output ...
In this work we present a Mixture of Task-Aware Experts Network for Machine Reading Comprehension on...
Deep learning models for vision tasks are trained on large datasets under the assumption that there ...
Typical multi-task learning (MTL) methods rely on architectural adjustments and a large trainable pa...
Structure prediction (SP) tasks are important in natural language understanding in the sense that th...
In the context of multi-task learning, neural networks with branched architectures have often been e...
The Transformer model is a very recent, fast and powerful discovery in neural machine translation. W...
Deep imitation learning requires many expert demonstrations, which can be hard to obtain, especially...
Recent work has shown the promise of creating generalist, transformer-based, models for language, vi...
International audienceIn open-ended continuous environments, robots need to learn multiple parameter...