We study the problem of selecting K arms with the highest expected rewards in a stochastic N-armed bandit game. Instead of using existing evaluation metrics (e.g., misidentification probability (Bubeck et al., 2013) or the metric in Explore-K(Kalyanakrishnan & Stone, 2010)), we propose to use the aggregate regret, which is defined as the gap between the average reward of the optimal solution and that of our solution. Besides being a natural metric by itself, we argue that in many applications, such as our motivating example from the crowdsourcing, the aggregate regret bound is more suitable. We propose a new PAC algorithm, which, with probability at least 1−δ, identifies a set of K arms with regret at most . We provide a detailed analys...
International audienceThe stochastic multi-armed bandit model is a simple abstraction that has prove...
International audienceWe consider a stochastic bandit problem with infinitely many arms. In this set...
In this thesis, we study strategies for sequential resource allocation, under the so-called stochast...
We study the problem of selecting K arms with the highest expected rewards in a stochastic n-armed b...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
We propose a PAC formulation for identifying an arm in an n-armed bandit whose mean is within a fixe...
We consider the problem of selecting, from among the arms of a stochastic n-armed bandit, a subset o...
We consider multi-armed bandit problems where the number of arms is larger than the possible number ...
Abstract We consider the Max K-Armed Bandit problem, where a learning agent is faced with several st...
International audienceWe consider multi-armed bandit problems where the number of arms is larger tha...
Regret minimisation in stochastic multi-armed bandits is a well-studied problem, for which several o...
We consider a stochastic bandit problem with a possibly infinite number of arms. We write p∗ for the...
International audienceIn the classical multi-armed bandit problem, d arms are available to the decis...
We consider the top-k arm identification problem for multi-armed bandits with rewards belonging to a...
International audienceIn the classical multi-armed bandit problem, d arms are available to the decis...
International audienceThe stochastic multi-armed bandit model is a simple abstraction that has prove...
International audienceWe consider a stochastic bandit problem with infinitely many arms. In this set...
In this thesis, we study strategies for sequential resource allocation, under the so-called stochast...
We study the problem of selecting K arms with the highest expected rewards in a stochastic n-armed b...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
We propose a PAC formulation for identifying an arm in an n-armed bandit whose mean is within a fixe...
We consider the problem of selecting, from among the arms of a stochastic n-armed bandit, a subset o...
We consider multi-armed bandit problems where the number of arms is larger than the possible number ...
Abstract We consider the Max K-Armed Bandit problem, where a learning agent is faced with several st...
International audienceWe consider multi-armed bandit problems where the number of arms is larger tha...
Regret minimisation in stochastic multi-armed bandits is a well-studied problem, for which several o...
We consider a stochastic bandit problem with a possibly infinite number of arms. We write p∗ for the...
International audienceIn the classical multi-armed bandit problem, d arms are available to the decis...
We consider the top-k arm identification problem for multi-armed bandits with rewards belonging to a...
International audienceIn the classical multi-armed bandit problem, d arms are available to the decis...
International audienceThe stochastic multi-armed bandit model is a simple abstraction that has prove...
International audienceWe consider a stochastic bandit problem with infinitely many arms. In this set...
In this thesis, we study strategies for sequential resource allocation, under the so-called stochast...