Learning high-quality feature embeddings efficiently and effectively is critical for the performance of web-scale machine learning systems. A typical model ingests hundreds of features with vocabularies on the order of millions to billions of tokens. The standard approach is to represent each feature value as a d-dimensional embedding, introducing hundreds of billions of parameters for extremely high-cardinality features. This bottleneck has led to substantial progress in alternative embedding algorithms. Many of these methods, however, make the assumption that each feature uses an independent embedding table. This work introduces a simple yet highly effective framework, Feature Multiplexing, where one single representation space is used ac...
Universal embeddings, such as BERT or ELMo, are useful for a broad set of natural language processin...
Scaling language models with more data, compute and parameters has driven significant progress in na...
Word embedding aims to learn a continuous representation for each word. It attracts increasing atten...
Feature embedding aims to learn a low-dimensional vector representation for each instance to preserv...
Word embedding algorithms produce very reliable feature representations of words that are used by ne...
Feature representation has been one of the most important factors for the success of machine learnin...
Fine-grained and instance-level recognition methods are commonly trained and evaluated on specific d...
We tackle the challenge of feature embedding for the purposes of improving the click-through rate pr...
The digital era floods us with an excessive amount of text data. To make sense of such data automati...
Dense vector representations for textual data are crucial in modern NLP. Word embeddings and sentenc...
Recent self-supervised models have demonstrated equal or better performance than supervised methods,...
Word embedding aims to learn a continuous representation for each word. It attracts increasing atten...
Recommendation systems have been deployed in e-commerce and online advertising to expose desired ite...
Deep learning based recommendation models (DLRM) are widely used in several business critical applic...
In this thesis, we develop principled machine learning methods suited for complex real-world Interne...
Universal embeddings, such as BERT or ELMo, are useful for a broad set of natural language processin...
Scaling language models with more data, compute and parameters has driven significant progress in na...
Word embedding aims to learn a continuous representation for each word. It attracts increasing atten...
Feature embedding aims to learn a low-dimensional vector representation for each instance to preserv...
Word embedding algorithms produce very reliable feature representations of words that are used by ne...
Feature representation has been one of the most important factors for the success of machine learnin...
Fine-grained and instance-level recognition methods are commonly trained and evaluated on specific d...
We tackle the challenge of feature embedding for the purposes of improving the click-through rate pr...
The digital era floods us with an excessive amount of text data. To make sense of such data automati...
Dense vector representations for textual data are crucial in modern NLP. Word embeddings and sentenc...
Recent self-supervised models have demonstrated equal or better performance than supervised methods,...
Word embedding aims to learn a continuous representation for each word. It attracts increasing atten...
Recommendation systems have been deployed in e-commerce and online advertising to expose desired ite...
Deep learning based recommendation models (DLRM) are widely used in several business critical applic...
In this thesis, we develop principled machine learning methods suited for complex real-world Interne...
Universal embeddings, such as BERT or ELMo, are useful for a broad set of natural language processin...
Scaling language models with more data, compute and parameters has driven significant progress in na...
Word embedding aims to learn a continuous representation for each word. It attracts increasing atten...