This dissertation is dedicated to the study of the Thompson Sampling (TS) algorithms designed to address the exploration-exploitation dilemma that is inherent in sequential decision-making under uncertainty. As opposed to algorithms based on the optimism-in-the-face-of-uncertainty (OFU) principle, where the exploration is performed by selecting the most favorable model within the set of plausible one, TS algorithms rely on randomization to enhance the exploration, and thus are much more computationally efficient. We focus on linearly parametrized problems that allow for continuous state-action spaces, namely the Linear Bandit (LB) problems and the Linear Quadratic (LQ) control problems. We derive two novel analyses for the regret of TS algo...
This paper considers the use of a simple posterior sampling algorithm to balance between exploration...
International audienceOriginally motivated by default risk management applications, this paper inves...
International audienceThis paper introduces and addresses a wide class of stochastic bandit problems...
This dissertation is dedicated to the study of the Thompson Sampling (TS) algorithms designed to add...
This dissertation is dedicated to the study of the Thompson Sampling (TS) algorithms designed to add...
Cette thèse est dédiée à l'étude du Thompson Sampling (TS), une heuristique qui vise à surmonter le ...
International audienceWe consider the exploration-exploitation tradeoff in linear quadratic (LQ) con...
International audienceWe consider the exploration-exploitation tradeoff in linear quadratic (LQ) con...
International audienceWe derive an alternative proof for the regret of Thompson sampling (\ts) in th...
International audienceWe derive an alternative proof for the regret of Thompson sampling (\ts) in th...
The multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in...
We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under ...
We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under ...
© 2019 International Joint Conferences on Artificial Intelligence. All rights reserved. Thompson Sam...
We address the problem of online sequential decision making, i.e., balancing the trade-off between e...
This paper considers the use of a simple posterior sampling algorithm to balance between exploration...
International audienceOriginally motivated by default risk management applications, this paper inves...
International audienceThis paper introduces and addresses a wide class of stochastic bandit problems...
This dissertation is dedicated to the study of the Thompson Sampling (TS) algorithms designed to add...
This dissertation is dedicated to the study of the Thompson Sampling (TS) algorithms designed to add...
Cette thèse est dédiée à l'étude du Thompson Sampling (TS), une heuristique qui vise à surmonter le ...
International audienceWe consider the exploration-exploitation tradeoff in linear quadratic (LQ) con...
International audienceWe consider the exploration-exploitation tradeoff in linear quadratic (LQ) con...
International audienceWe derive an alternative proof for the regret of Thompson sampling (\ts) in th...
International audienceWe derive an alternative proof for the regret of Thompson sampling (\ts) in th...
The multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in...
We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under ...
We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under ...
© 2019 International Joint Conferences on Artificial Intelligence. All rights reserved. Thompson Sam...
We address the problem of online sequential decision making, i.e., balancing the trade-off between e...
This paper considers the use of a simple posterior sampling algorithm to balance between exploration...
International audienceOriginally motivated by default risk management applications, this paper inves...
International audienceThis paper introduces and addresses a wide class of stochastic bandit problems...