Recent studies have determined that the learned token embeddings of large-scale neural language models are degenerated to be anisotropic with a narrow-cone shape. This phenomenon, called the representation degeneration problem, facilitates an increase in the overall similarity between token embeddings that negatively affect the performance of the models. Although the existing methods that address the degeneration problem based on observations of the phenomenon triggered by the problem improves the performance of the text generation, the training dynamics of token embeddings behind the degeneration problem are still not explored. In this study, we analyze the training dynamics of the token embeddings focusing on rare token embedding. We demo...
Over-paramaterized neural models have become dominant in Natural Language Processing. Increasing the...
Despite considerable advances in neural language modeling, it remains an open question what the best...
Despite considerable advances in neural language modeling, it remains an open question what the best...
Recent studies have determined that the learned token embeddings of large-scale neural language mode...
Text generation is of great importance to many natural language processing applications. However, ma...
Neural language models often fail to generate diverse and informative texts, limiting their applicab...
International audienceStatic subword tokenization algorithms have been an essential component of rec...
Generative language models are usually pretrained on large text corpus via predicting the next token...
Gradient-based adversarial training is widely used in improving the robustness of neural networks, w...
Pretraining deep neural network architectures with a language modeling objective has brought large i...
Neural text generation models that are conditioned on a given input (e.g., machine translation and i...
Neural text generation models that are conditioned on a given input (e.g., machine translation and i...
Deep neural language models like GPT-2 is undoubtedly strong at text generation, but often requires ...
Large-scale neural language models have made impressive strides in natural language generation. Howe...
In this study, we compare token representations constructed from visual features (i.e., pixe...
Over-paramaterized neural models have become dominant in Natural Language Processing. Increasing the...
Despite considerable advances in neural language modeling, it remains an open question what the best...
Despite considerable advances in neural language modeling, it remains an open question what the best...
Recent studies have determined that the learned token embeddings of large-scale neural language mode...
Text generation is of great importance to many natural language processing applications. However, ma...
Neural language models often fail to generate diverse and informative texts, limiting their applicab...
International audienceStatic subword tokenization algorithms have been an essential component of rec...
Generative language models are usually pretrained on large text corpus via predicting the next token...
Gradient-based adversarial training is widely used in improving the robustness of neural networks, w...
Pretraining deep neural network architectures with a language modeling objective has brought large i...
Neural text generation models that are conditioned on a given input (e.g., machine translation and i...
Neural text generation models that are conditioned on a given input (e.g., machine translation and i...
Deep neural language models like GPT-2 is undoubtedly strong at text generation, but often requires ...
Large-scale neural language models have made impressive strides in natural language generation. Howe...
In this study, we compare token representations constructed from visual features (i.e., pixe...
Over-paramaterized neural models have become dominant in Natural Language Processing. Increasing the...
Despite considerable advances in neural language modeling, it remains an open question what the best...
Despite considerable advances in neural language modeling, it remains an open question what the best...