The bandit problem models a sequential decision process between a player and an environment. In the beginning, the player has no prior knowledge about the environment, but can gain experience by sequentially pulling an arm out of a set of arms and receiving rewards from the environment. The goal is to maximize the cumulative rewards. In this dissertation, we focus on the stochastic contextual bandit problems, where each arm is associated with a feature vector and the observed reward of each pulled arm follows a stochastic process. The bandit problems enjoy solid theoretical foundations in the literature and have substantial applications in many real tasks such as recommender systems, online advertising, clinical trials, etc. However, there ...
International audienceWe consider the stochastic contextual bandit problem with additional regulariz...
We propose an online algorithm for sequential learning in the contextual multiarmed bandit setting. ...
We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly ...
Stochastic linear contextual bandit algorithms have substantial applications in practice, such as re...
Machine and Statistical Learning techniques are used in almost all online advertisement systems. The...
In stochastic contextual bandit (SCB) problems, an agent selects an action based on certain observed...
The stochastic multi-armed bandit problem is an important model for studying the exploration-exploit...
We introduce a stochastic contextual bandit model where at each time step the environment chooses a ...
Contextual multi-armed bandit algorithms are widely used in sequential decision tasks such as news a...
We analyze the K-armed bandit problem where the reward for each arm is a noisy realization based on ...
Contextual multi-armed bandit (MAB) algorithms have been shown promising for maximizing cumulative r...
We present an algorithm for multiarmed bandits that achieves almost optimal performance in both stoc...
Abstract—We present a formal model of human decision-making in explore-exploit tasks using the conte...
We present a new algorithm for the contextual bandit learning problem, where the learner repeat-edly...
Contextual bandits with linear payoffs, which are also known as linear bandits, provide a powerful a...
International audienceWe consider the stochastic contextual bandit problem with additional regulariz...
We propose an online algorithm for sequential learning in the contextual multiarmed bandit setting. ...
We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly ...
Stochastic linear contextual bandit algorithms have substantial applications in practice, such as re...
Machine and Statistical Learning techniques are used in almost all online advertisement systems. The...
In stochastic contextual bandit (SCB) problems, an agent selects an action based on certain observed...
The stochastic multi-armed bandit problem is an important model for studying the exploration-exploit...
We introduce a stochastic contextual bandit model where at each time step the environment chooses a ...
Contextual multi-armed bandit algorithms are widely used in sequential decision tasks such as news a...
We analyze the K-armed bandit problem where the reward for each arm is a noisy realization based on ...
Contextual multi-armed bandit (MAB) algorithms have been shown promising for maximizing cumulative r...
We present an algorithm for multiarmed bandits that achieves almost optimal performance in both stoc...
Abstract—We present a formal model of human decision-making in explore-exploit tasks using the conte...
We present a new algorithm for the contextual bandit learning problem, where the learner repeat-edly...
Contextual bandits with linear payoffs, which are also known as linear bandits, provide a powerful a...
International audienceWe consider the stochastic contextual bandit problem with additional regulariz...
We propose an online algorithm for sequential learning in the contextual multiarmed bandit setting. ...
We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly ...