In this study, we present an investigation into the anisotropy dynamics and intrinsic dimension of embeddings in transformer architectures, focusing on the dichotomy between encoders and decoders. Our findings reveal that the anisotropy profile in transformer decoders exhibits a distinct bell-shaped curve, with the highest anisotropy concentrations in the middle layers. This pattern diverges from the more uniformly distributed anisotropy observed in encoders. In addition, we found that the intrinsic dimension of embeddings increases in the initial phases of training, indicating an expansion into higher-dimensional space. Which is then followed by a compression phase towards the end of training with dimensionality decrease, suggesting a refi...
Characterizing neural networks in terms of better-understood formal systems has the potential to yie...
Self-supervised training methods for transformers have demonstrated remarkable performance across va...
We study representations of data from an arbitrary metric space $\mathcal{X}$ in the space of univar...
ACL-SRW 2023 (Poster)The representation degeneration problem is a phenomenon that is widely observed...
Transformer based language models exhibit intelligent behaviors such as understanding natural langua...
Pretrained embeddings based on the Transformer architecture have taken the NLP community by storm. W...
Different studies of the embedding space of transformer models suggest that the distribution of cont...
In this paper, we aim to build the global convergence theory of encoder-only shallow Transformers un...
While Transformer-based language models are generally very robust to pruning, there is the recently ...
This document aims to be a self-contained, mathematically precise overview of transformer architectu...
The task of shape space learning involves mapping a train set of shapes to and from a latent represe...
The transformer is a neural network component that can be used to learn useful representations of se...
In recent years, many interpretability methods have been proposed to help interpret the internal sta...
In this paper, we aim to build the global convergence theory of encoder-only shallow Transformers un...
Transformer networks have seen great success in natural language processing and machine vision, wher...
Characterizing neural networks in terms of better-understood formal systems has the potential to yie...
Self-supervised training methods for transformers have demonstrated remarkable performance across va...
We study representations of data from an arbitrary metric space $\mathcal{X}$ in the space of univar...
ACL-SRW 2023 (Poster)The representation degeneration problem is a phenomenon that is widely observed...
Transformer based language models exhibit intelligent behaviors such as understanding natural langua...
Pretrained embeddings based on the Transformer architecture have taken the NLP community by storm. W...
Different studies of the embedding space of transformer models suggest that the distribution of cont...
In this paper, we aim to build the global convergence theory of encoder-only shallow Transformers un...
While Transformer-based language models are generally very robust to pruning, there is the recently ...
This document aims to be a self-contained, mathematically precise overview of transformer architectu...
The task of shape space learning involves mapping a train set of shapes to and from a latent represe...
The transformer is a neural network component that can be used to learn useful representations of se...
In recent years, many interpretability methods have been proposed to help interpret the internal sta...
In this paper, we aim to build the global convergence theory of encoder-only shallow Transformers un...
Transformer networks have seen great success in natural language processing and machine vision, wher...
Characterizing neural networks in terms of better-understood formal systems has the potential to yie...
Self-supervised training methods for transformers have demonstrated remarkable performance across va...
We study representations of data from an arbitrary metric space $\mathcal{X}$ in the space of univar...