Motivated by problems of learning to rank long item sequences, we introduce a variant of the cascading bandit model that considers flexible length sequences with varying rewards and losses. We formulate two generative models for this problem within the generalized linear setting, and design and analyze upper confidence algorithms for it. Our analysis delivers tight regret bounds which, when specialized to vanilla cascading bandits, results in sharper guarantees than previously available in the literature. We evaluate our algorithms on a number of real-world datasets, and show significantly improved empirical performance as compared to known cascading bandit baselines
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
We consider a sequential assortment selection problem where the user choice is given by a multinomia...
We propose a new online algorithm for minimizing the cumulative regret in stochastic linear bandits....
Non-stationarity appears in many online applications such as web search and advertising. In this pap...
Cascading bandits is a natural and popular model that frames the task of learning to rank from Berno...
Abstract Most recommender systems recommend a list of items. The user examines the list, from the fi...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
We study a ranking problem in the contextual multi-armed bandit setting. A learning agent selects an...
International audienceAlgorithms for learning to rank Web documents, display ads, or other types of ...
The statistical framework of Generalized Linear Models (GLM) can be applied to sequential problems i...
Contextual combinatorial cascading bandit ( $C^{3}$ -bandit) is a powerful multi-armed bandit framew...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
International audienceWe tackle the problem of online reward maximisation over a large finite set of...
Dueling bandits are widely used to model preferential feedback prevalent in many applications such a...
We develop a learning principle and an efficient algorithm for batch learning from logged bandit fee...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
We consider a sequential assortment selection problem where the user choice is given by a multinomia...
We propose a new online algorithm for minimizing the cumulative regret in stochastic linear bandits....
Non-stationarity appears in many online applications such as web search and advertising. In this pap...
Cascading bandits is a natural and popular model that frames the task of learning to rank from Berno...
Abstract Most recommender systems recommend a list of items. The user examines the list, from the fi...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
We study a ranking problem in the contextual multi-armed bandit setting. A learning agent selects an...
International audienceAlgorithms for learning to rank Web documents, display ads, or other types of ...
The statistical framework of Generalized Linear Models (GLM) can be applied to sequential problems i...
Contextual combinatorial cascading bandit ( $C^{3}$ -bandit) is a powerful multi-armed bandit framew...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
International audienceWe tackle the problem of online reward maximisation over a large finite set of...
Dueling bandits are widely used to model preferential feedback prevalent in many applications such a...
We develop a learning principle and an efficient algorithm for batch learning from logged bandit fee...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
We consider a sequential assortment selection problem where the user choice is given by a multinomia...
We propose a new online algorithm for minimizing the cumulative regret in stochastic linear bandits....