The bandit classification problem considers learning the labels of a time-indexed data stream under a mere " hit-or-miss " binary guiding. Adapting the OVA (" one-versus-all ") hinge loss setup, we develop a sparse and lightweight solution to this problem. The issued sequential norm-minimal update solves the classification problem in finite time in the separable case, provided enough redundancy is present in the data. An O(√ T) regret in moreover expected in the non-separable case. The algorithm shows effectiveness on both large scale text-mining and machine learning datasets, with (i) a favorable comparison with the more demanding confidence-based second-order bandits setups on large scale datasets and (ii) a good sparsity and efficacy whe...
We introduce and study a partial-information model of online learning, where a decision maker repeat...
This paper introduces the Banditron, a vari-ant of the Perceptron [Rosenblatt, 1958], for the multic...
International audienceWe consider online learning in finite stochastic Markovian environments where ...
The bandit classification problem considers learning the labels of a time-indexed data stream under ...
This paper introduces the Banditron, a variant of the Perceptron [Rosenblatt, 1958], for the multicl...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
International audienceIn this paper we consider the problem of online stochastic optimization of a l...
We consider a novel variant of the contextual bandit problem (i.e., the multi-armed bandit with side...
International audienceWe consider online learning problems under a a partial observability model cap...
International audienceWe investigate a nonstochastic bandit setting in which the loss of an action i...
International audienceIn the fixed budget thresholding bandit problem, an algorithm sequentially all...
In a bandit problem there is a set of arms, each of which when played by an agent yields some reward...
We consider the problem of online multiclass prediction in the bandit setting. Compared with the ful...
This paper considers no-regret learning for repeated continuous-kernel games with lossy bandit feedb...
This document is the full version of an extended abstract published in the proceedings of COLT 2017....
We introduce and study a partial-information model of online learning, where a decision maker repeat...
This paper introduces the Banditron, a vari-ant of the Perceptron [Rosenblatt, 1958], for the multic...
International audienceWe consider online learning in finite stochastic Markovian environments where ...
The bandit classification problem considers learning the labels of a time-indexed data stream under ...
This paper introduces the Banditron, a variant of the Perceptron [Rosenblatt, 1958], for the multicl...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
International audienceIn this paper we consider the problem of online stochastic optimization of a l...
We consider a novel variant of the contextual bandit problem (i.e., the multi-armed bandit with side...
International audienceWe consider online learning problems under a a partial observability model cap...
International audienceWe investigate a nonstochastic bandit setting in which the loss of an action i...
International audienceIn the fixed budget thresholding bandit problem, an algorithm sequentially all...
In a bandit problem there is a set of arms, each of which when played by an agent yields some reward...
We consider the problem of online multiclass prediction in the bandit setting. Compared with the ful...
This paper considers no-regret learning for repeated continuous-kernel games with lossy bandit feedb...
This document is the full version of an extended abstract published in the proceedings of COLT 2017....
We introduce and study a partial-information model of online learning, where a decision maker repeat...
This paper introduces the Banditron, a vari-ant of the Perceptron [Rosenblatt, 1958], for the multic...
International audienceWe consider online learning in finite stochastic Markovian environments where ...