We study the problem of batch learning from bandit feedback in the setting of extremely large action spaces. Learning from extreme bandit feedback is ubiquitous in recommendation systems, in which billions of decisions are made over sets consisting of millions of choices in a single day, yielding massive observational data. In these large-scale real-world applications, supervised learning frameworks such as eXtreme Multi-label Classification (XMC) are widely used despite the fact that they incur significant biases due to the mismatch between bandit feedback and supervised labels. Such biases can be mitigated by importance sampling techniques, but these techniques suffer from impractical variance when dealing with a large number of actions. ...
We investigate the structural properties of certain sequential decision-making problems with limited...
Contextual bandits are canonical models for sequential decision-making under uncertainty in environm...
Sequential decision-making is an iterative process between a learning agent and an environment. We s...
We develop a learning principle and an efficient algorithm for batch learning from logged bandit fee...
We develop a learning principle and an efficient algorithm for batch learning from logged bandit fee...
This thesis considers the multi-armed bandit (MAB) problem, both the traditional bandit feedback and...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
In many areas of medicine, security, and life sciences, we want to allocate lim-ited resources to di...
We consider a special case of bandit problems, named batched bandits, in which an agent observes bat...
This paper identifies a severe problem of the counterfactual risk estimator typi-cally used in batch...
What is the most statistically efficient way to do off-policy optimization with batch data from band...
This paper considers the use of a simple posterior sampling algorithm to balance between exploration...
In performative prediction, the deployment of a predictive model triggers a shift in the data distri...
The multi-armed bandit (MAB) problem is a mathematical formulation of the exploration-exploitation t...
Bandit learning has been an increasingly popular design choice for recommender system. Despite the s...
We investigate the structural properties of certain sequential decision-making problems with limited...
Contextual bandits are canonical models for sequential decision-making under uncertainty in environm...
Sequential decision-making is an iterative process between a learning agent and an environment. We s...
We develop a learning principle and an efficient algorithm for batch learning from logged bandit fee...
We develop a learning principle and an efficient algorithm for batch learning from logged bandit fee...
This thesis considers the multi-armed bandit (MAB) problem, both the traditional bandit feedback and...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
In many areas of medicine, security, and life sciences, we want to allocate lim-ited resources to di...
We consider a special case of bandit problems, named batched bandits, in which an agent observes bat...
This paper identifies a severe problem of the counterfactual risk estimator typi-cally used in batch...
What is the most statistically efficient way to do off-policy optimization with batch data from band...
This paper considers the use of a simple posterior sampling algorithm to balance between exploration...
In performative prediction, the deployment of a predictive model triggers a shift in the data distri...
The multi-armed bandit (MAB) problem is a mathematical formulation of the exploration-exploitation t...
Bandit learning has been an increasingly popular design choice for recommender system. Despite the s...
We investigate the structural properties of certain sequential decision-making problems with limited...
Contextual bandits are canonical models for sequential decision-making under uncertainty in environm...
Sequential decision-making is an iterative process between a learning agent and an environment. We s...