We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir [14]. Our main result is a characterization of regret in the directed observability model in terms of the dominating and independence numbers of the observability graph (which must be accessible before selecting an action). In the undirected case, we show that the learner can achieve optimal regret without even accessing the observability graph before selecting an action. Both results are shown using variants of the Exp3 algorithm operating on the observ-ability graph in a time-efficient manner.
Abstract. We present and study a partial-information model of online learning, where a decision make...
We study a partial-information online-learning problem where actions are restricted to noisy compar...
We study regret minimization bounds in which the dependence on the number of experts is replaced by ...
We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir...
We consider an adversarial online learning setting where a decision maker can choose an action in ev...
Imitation learning has been widely used to speed up learning in novice agents, by allowing them to l...
AbstractThe nonstochastic multi-armed bandit problem, first studied by Auer, Cesa-Bianchi, Freund, a...
International audienceWe propose a new partial-observability model for online learning problems wher...
We introduce and study a partial-information model of online learning, where a decision maker repeat...
We introduce and study a partial-information model of online learning, where a decision maker repeat...
We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics in which a...
Multi-armed bandit (MAB) is a classic model for understanding the exploration-exploitation trade-off...
We consider a linear stochastic bandit problem involving $M$ agents that can collaborate via a centr...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
AI and machine learning methods are increasingly interacting with and seeking information from peopl...
Abstract. We present and study a partial-information model of online learning, where a decision make...
We study a partial-information online-learning problem where actions are restricted to noisy compar...
We study regret minimization bounds in which the dependence on the number of experts is replaced by ...
We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir...
We consider an adversarial online learning setting where a decision maker can choose an action in ev...
Imitation learning has been widely used to speed up learning in novice agents, by allowing them to l...
AbstractThe nonstochastic multi-armed bandit problem, first studied by Auer, Cesa-Bianchi, Freund, a...
International audienceWe propose a new partial-observability model for online learning problems wher...
We introduce and study a partial-information model of online learning, where a decision maker repeat...
We introduce and study a partial-information model of online learning, where a decision maker repeat...
We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics in which a...
Multi-armed bandit (MAB) is a classic model for understanding the exploration-exploitation trade-off...
We consider a linear stochastic bandit problem involving $M$ agents that can collaborate via a centr...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
AI and machine learning methods are increasingly interacting with and seeking information from peopl...
Abstract. We present and study a partial-information model of online learning, where a decision make...
We study a partial-information online-learning problem where actions are restricted to noisy compar...
We study regret minimization bounds in which the dependence on the number of experts is replaced by ...