We propose a new sequential decision-making setting, combining key aspects of two established online learning problems with bandit feedback. The optimal action to play at any given moment is contingent on an underlying changing state which is not directly observable by the agent. Each state is associated with a context distribution, possibly corrupted, allowing the agent to identify the state. Furthermore, states evolve in a Markovian fashion, providing useful information to estimate the current state via state history. In the proposed problem setting, we tackle the challenge of deciding on which of the two sources of information the agent should base its arm selection. We present an algorithm that uses a referee to dynamically combine the ...
Contextual multi-armed bandit (MAB) algorithms have been shown promising for maximizing cumulative r...
A standard assumption in contextual multi-arm bandit is that the true context is perfectly known bef...
Contextual bandits are canonical models for sequential decision-making under uncertainty in environm...
We consider a Latent Bandit problem where the latent state keeps changing in time according to an un...
The data explosion and development of artificial intelligence (AI) has fueled the demand for recomme...
Learning action policy for autonomous agents in a decentralized multi-agent environment has remained...
University of Minnesota Ph.D. dissertation. May 2020. Major: Statistics. Advisor: Yuhong Yang. 1 com...
The bandit problem models a sequential decision process between a player and an environment. In the ...
Bandit problems provide an interesting and widely-used setting for the study of sequential decision-...
Many well-studied online decision-making and learning models rely on the assumption that the environ...
Multi-armed bandit (MAB) is a classic model for understanding the exploration-exploitation trade-off...
We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics in which a...
In many real-world sequential decision-making problems, an action does not immediately reflect on th...
We present a method to solve the problem of choosing a set of adverts to display to each of a sequen...
In a bandit problem there is a set of arms, each of which when played by an agent yields some reward...
Contextual multi-armed bandit (MAB) algorithms have been shown promising for maximizing cumulative r...
A standard assumption in contextual multi-arm bandit is that the true context is perfectly known bef...
Contextual bandits are canonical models for sequential decision-making under uncertainty in environm...
We consider a Latent Bandit problem where the latent state keeps changing in time according to an un...
The data explosion and development of artificial intelligence (AI) has fueled the demand for recomme...
Learning action policy for autonomous agents in a decentralized multi-agent environment has remained...
University of Minnesota Ph.D. dissertation. May 2020. Major: Statistics. Advisor: Yuhong Yang. 1 com...
The bandit problem models a sequential decision process between a player and an environment. In the ...
Bandit problems provide an interesting and widely-used setting for the study of sequential decision-...
Many well-studied online decision-making and learning models rely on the assumption that the environ...
Multi-armed bandit (MAB) is a classic model for understanding the exploration-exploitation trade-off...
We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics in which a...
In many real-world sequential decision-making problems, an action does not immediately reflect on th...
We present a method to solve the problem of choosing a set of adverts to display to each of a sequen...
In a bandit problem there is a set of arms, each of which when played by an agent yields some reward...
Contextual multi-armed bandit (MAB) algorithms have been shown promising for maximizing cumulative r...
A standard assumption in contextual multi-arm bandit is that the true context is perfectly known bef...
Contextual bandits are canonical models for sequential decision-making under uncertainty in environm...