We consider a novel variant of the contextual bandit problem (i.e., the multi-armed bandit with side-information, or context, available to a decision-maker) where the reward associated with each context-based decision may not always be observed ("missing rewards"). This new problem is motivated by certain on-line settings including clinical trial and ad recommendation applications. In order to address the missing-reward setting, we propose to combine the standard contextual bandit approach with an unsupervised learning mechanism, such as, for example, sparse coding. Unlike standard contextual bandit methods, we are able to learn from all contexts, even those with missing rewards, by improving the representation of a context (via dictionary ...
A standard assumption in contextual multi-arm bandit is that the true context is perfectly known bef...
We propose a new sequential decision-making setting, combining key aspects of two established online...
Machine and Statistical Learning techniques are used in almost all online advertisement systems. The...
The data explosion and development of artificial intelligence (AI) has fueled the demand for recomme...
We present a new algorithm for the contextual bandit learning problem, where the learner repeat-edly...
We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly ...
Contextual multi-armed bandit (MAB) algorithms have been shown promising for maximizing cumulative r...
International audienceThis paper presents a new contextual bandit algorithm, NeuralBandit, which doe...
The bandit problem models a sequential decision process between a player and an environment. In the ...
The bandit classification problem considers learning the labels of a time-indexed data stream under ...
Contextual multi-armed bandit algorithms are widely used in sequential decision tasks such as news a...
International audienceContextual bandit algorithms are essential for solving many real-world interac...
Contextual bandits with linear payoffs, which are also known as linear bandits, provide a powerful a...
Learning action policy for autonomous agents in a decentralized multi-agent environment has remained...
© 2019 Neural information processing systems foundation. All rights reserved. In the classical conte...
A standard assumption in contextual multi-arm bandit is that the true context is perfectly known bef...
We propose a new sequential decision-making setting, combining key aspects of two established online...
Machine and Statistical Learning techniques are used in almost all online advertisement systems. The...
The data explosion and development of artificial intelligence (AI) has fueled the demand for recomme...
We present a new algorithm for the contextual bandit learning problem, where the learner repeat-edly...
We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly ...
Contextual multi-armed bandit (MAB) algorithms have been shown promising for maximizing cumulative r...
International audienceThis paper presents a new contextual bandit algorithm, NeuralBandit, which doe...
The bandit problem models a sequential decision process between a player and an environment. In the ...
The bandit classification problem considers learning the labels of a time-indexed data stream under ...
Contextual multi-armed bandit algorithms are widely used in sequential decision tasks such as news a...
International audienceContextual bandit algorithms are essential for solving many real-world interac...
Contextual bandits with linear payoffs, which are also known as linear bandits, provide a powerful a...
Learning action policy for autonomous agents in a decentralized multi-agent environment has remained...
© 2019 Neural information processing systems foundation. All rights reserved. In the classical conte...
A standard assumption in contextual multi-arm bandit is that the true context is perfectly known bef...
We propose a new sequential decision-making setting, combining key aspects of two established online...
Machine and Statistical Learning techniques are used in almost all online advertisement systems. The...