An online reinforcement learning algorithm is anytime if it does not need to know in advance the horizon T of the experiment. A well-known technique to obtain an anytime algorithm from any non-anytime algorithm is the "Doubling Trick". In the context of adversarial or stochastic multi-armed bandits, the performance of an algorithm is measured by its regret, and we study two families of sequences of growing horizons (geometric and exponential) to generalize previously known results that certain doubling tricks can be used to conserve certain regret bounds. In a broad setting, we prove that a geometric doubling trick can be used to conserve (minimax) bounds in $R_T = O(\sqrt{T})$ but cannot conserve (distribution-dependent) bounds in $R_T = O...
Combinatorial stochastic semi-bandits appear naturally in many contexts where the exploration/exploi...
In this document, we give an overview of recent contributions to the mathematics of statistical sequ...
The topics addressed in this thesis lie in statistical machine learning and sequential statistic. Ou...
An online reinforcement learning algorithm is anytime if it does not need to know in advance the hor...
The main topics adressed in this thesis lie in the general domain of sequential learning, and in par...
This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the...
We present a learning algorithm for undiscounted reinforcement learning. Our interest lies in bounds...
This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
International audienceThis paper studies the deviations of the regret in a stochastic multi-armed ba...
We consider the framework of stochastic multi-armed bandit problems and study the possibilities and ...
This thesis studies the following topics in Machine Learning: Bandit theory, Statistical learning an...
International audienceIn the classical multi-armed bandit problem, d arms are available to the decis...
This thesis studies several extensions of multi-armed bandit problem, where a learner sequentially s...
Combinatorial stochastic semi-bandits appear naturally in many contexts where the exploration/exploi...
In this document, we give an overview of recent contributions to the mathematics of statistical sequ...
The topics addressed in this thesis lie in statistical machine learning and sequential statistic. Ou...
An online reinforcement learning algorithm is anytime if it does not need to know in advance the hor...
The main topics adressed in this thesis lie in the general domain of sequential learning, and in par...
This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the...
We present a learning algorithm for undiscounted reinforcement learning. Our interest lies in bounds...
This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
International audienceThis paper studies the deviations of the regret in a stochastic multi-armed ba...
We consider the framework of stochastic multi-armed bandit problems and study the possibilities and ...
This thesis studies the following topics in Machine Learning: Bandit theory, Statistical learning an...
International audienceIn the classical multi-armed bandit problem, d arms are available to the decis...
This thesis studies several extensions of multi-armed bandit problem, where a learner sequentially s...
Combinatorial stochastic semi-bandits appear naturally in many contexts where the exploration/exploi...
In this document, we give an overview of recent contributions to the mathematics of statistical sequ...
The topics addressed in this thesis lie in statistical machine learning and sequential statistic. Ou...