Recent research has pointed to a limitation of word-level neural language models with softmax outputs. This limitation, known as the softmax bottleneck refers to the inability of these models to produce high-rank log probability (log P) matrices. Various solutions have been proposed to break this bottleneck, including Mixture of Softmaxes, SigSoftmax, and Linear Monotonic Softmax with Piecewise Linear Increasing Functions. They were reported to offer better performance in terms of perplexity on test data. A natural perception from these results is a strong positive correlation between the rank of the log P matrix and the model's performance. In this work, we show via an extensive empirical study that such a correlation is fairly weak and th...
We describe a new model for learning meaningful representations of text docu-ments from an unlabeled...
For resource rich languages, recent works have shown Neural Network based Language Models (NNLMs) to...
[EN] This paper presents a new method to reduce the computational cost when using Neural Networks as...
Neural network language models (NNLMs) have attracted a lot of attention recently. In this paper, we...
Softmax is the de facto standard in modern neural networks for language processing when it comes to ...
Neural language models (LMs) based on recurrent neural networks (RNN) are some of the most successfu...
Recurrent Neural Networks (RNNs) have been very successful in many state-of-the-art solutions for na...
In many classification and prediction problems it is known that the response variable depends on cer...
Virtually any modern speech recognition system relies on count-based language models. In this thesis...
The quality of translations produced by statistical machine translation (SMT) systems crucially dep...
Neural network language models, or continuous-space language models (CSLMs), have been shown to impr...
Neural network language models, or continuous-space language models (CSLMs), have been shown to impr...
Large transformer models have achieved state-of-the-art results in numerous natural language process...
Recently, fully-connected and convolutional neural networks have been trained to achieve state-of-th...
36 pagesMixture-of-experts (MoE) model incorporates the power of multiple submodels via gating funct...
We describe a new model for learning meaningful representations of text docu-ments from an unlabeled...
For resource rich languages, recent works have shown Neural Network based Language Models (NNLMs) to...
[EN] This paper presents a new method to reduce the computational cost when using Neural Networks as...
Neural network language models (NNLMs) have attracted a lot of attention recently. In this paper, we...
Softmax is the de facto standard in modern neural networks for language processing when it comes to ...
Neural language models (LMs) based on recurrent neural networks (RNN) are some of the most successfu...
Recurrent Neural Networks (RNNs) have been very successful in many state-of-the-art solutions for na...
In many classification and prediction problems it is known that the response variable depends on cer...
Virtually any modern speech recognition system relies on count-based language models. In this thesis...
The quality of translations produced by statistical machine translation (SMT) systems crucially dep...
Neural network language models, or continuous-space language models (CSLMs), have been shown to impr...
Neural network language models, or continuous-space language models (CSLMs), have been shown to impr...
Large transformer models have achieved state-of-the-art results in numerous natural language process...
Recently, fully-connected and convolutional neural networks have been trained to achieve state-of-th...
36 pagesMixture-of-experts (MoE) model incorporates the power of multiple submodels via gating funct...
We describe a new model for learning meaningful representations of text docu-ments from an unlabeled...
For resource rich languages, recent works have shown Neural Network based Language Models (NNLMs) to...
[EN] This paper presents a new method to reduce the computational cost when using Neural Networks as...