We study the adversarial bandit problem under $S$ number of switching best arms for unknown $S$. For handling this problem, we adopt the master-base framework using the online mirror descent method (OMD). We first provide a master-base algorithm with basic OMD, achieving $\tilde{O}(S^{1/2}K^{1/3}T^{2/3})$. For improving the regret bound with respect to $T$, we propose to use adaptive learning rates for OMD to control variance of loss estimators, and achieve $\tilde{O}(\min\{\mathbb{E}[\sqrt{SKT\rho_T(h^\dagger)}],S\sqrt{KT}\})$, where $\rho_T(h^\dagger)$ is a variance term for loss estimators
We demonstrate a modification of the algorithm of Dani et al for the online linear optimization prob...
We study online reinforcement learning in linear Markov decision processes with adversarial losses a...
Online learning algorithms are designed to learn even when their input is generated by an adversary....
We study the power of different types of adaptive (nonoblivious) adversaries in the setting of predi...
We study online learning with bandit feedback across multiple tasks, with the goal of improving aver...
This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BO...
In bandit with distribution shifts, one aims to automatically adapt to unknown changes in reward dis...
In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial environm...
Consider the online convex optimization problem, in which a player has to choose ac-tions iterativel...
We study best-of-both-worlds algorithms for bandits with switching cost, recently addressed by Rouye...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
We present an algorithm for multiarmed bandits that achieves almost optimal performance in both stoc...
International audienceWe provide new lower bounds on the regret that must be suffered by adversarial...
We study a new class of online learning problems where each of the online algorithm’s actions is ass...
The multi-armed bandit (MAB) problem is a mathematical formulation of the exploration-exploitation t...
We demonstrate a modification of the algorithm of Dani et al for the online linear optimization prob...
We study online reinforcement learning in linear Markov decision processes with adversarial losses a...
Online learning algorithms are designed to learn even when their input is generated by an adversary....
We study the power of different types of adaptive (nonoblivious) adversaries in the setting of predi...
We study online learning with bandit feedback across multiple tasks, with the goal of improving aver...
This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BO...
In bandit with distribution shifts, one aims to automatically adapt to unknown changes in reward dis...
In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial environm...
Consider the online convex optimization problem, in which a player has to choose ac-tions iterativel...
We study best-of-both-worlds algorithms for bandits with switching cost, recently addressed by Rouye...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
We present an algorithm for multiarmed bandits that achieves almost optimal performance in both stoc...
International audienceWe provide new lower bounds on the regret that must be suffered by adversarial...
We study a new class of online learning problems where each of the online algorithm’s actions is ass...
The multi-armed bandit (MAB) problem is a mathematical formulation of the exploration-exploitation t...
We demonstrate a modification of the algorithm of Dani et al for the online linear optimization prob...
We study online reinforcement learning in linear Markov decision processes with adversarial losses a...
Online learning algorithms are designed to learn even when their input is generated by an adversary....