International audienceThis paper presents a new contextual bandit algorithm, NeuralBandit, which does not need hypothesis on stationarity of contexts and rewards. Several neural networks are trained to modelize the value of rewards knowing the context. Two variants, based on multi-experts approach, are proposed to choose online the parameters of multi-layer perceptrons. The proposed algorithms are successfully tested on a large dataset with and without stationarity of rewards
We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly ...
International audienceContextual bandit algorithms are essential for solving many real-world interac...
International audienceThis paper presents a new reinforcement learning (RL) algorithm called Bellman...
International audienceThis paper presents a new contextual bandit algorithm, NeuralBandit, which doe...
Graduate School of Artificial Intelligence ArtificiMulti-armed bandit is a well-formulated test bed ...
© 2019 Neural information processing systems foundation. All rights reserved. In the classical conte...
In stochastic contextual bandit (SCB) problems, an agent selects an action based on certain observed...
Recent works on neural contextual bandit have achieved compelling performances thanks to their abili...
We consider a novel variant of the contextual bandit problem (i.e., the multi-armed bandit with side...
The bandit problem models a sequential decision process between a player and an environment. In the ...
Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the adaptive control l...
Contextual multi-armed bandit (MAB) algorithms have been shown promising for maximizing cumulative r...
Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the adaptive control l...
We present a new algorithm for the contextual bandit learning problem, where the learner repeat-edly...
Contextual bandits aim to identify among a set of arms the optimal one with the highest reward based...
We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly ...
International audienceContextual bandit algorithms are essential for solving many real-world interac...
International audienceThis paper presents a new reinforcement learning (RL) algorithm called Bellman...
International audienceThis paper presents a new contextual bandit algorithm, NeuralBandit, which doe...
Graduate School of Artificial Intelligence ArtificiMulti-armed bandit is a well-formulated test bed ...
© 2019 Neural information processing systems foundation. All rights reserved. In the classical conte...
In stochastic contextual bandit (SCB) problems, an agent selects an action based on certain observed...
Recent works on neural contextual bandit have achieved compelling performances thanks to their abili...
We consider a novel variant of the contextual bandit problem (i.e., the multi-armed bandit with side...
The bandit problem models a sequential decision process between a player and an environment. In the ...
Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the adaptive control l...
Contextual multi-armed bandit (MAB) algorithms have been shown promising for maximizing cumulative r...
Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the adaptive control l...
We present a new algorithm for the contextual bandit learning problem, where the learner repeat-edly...
Contextual bandits aim to identify among a set of arms the optimal one with the highest reward based...
We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly ...
International audienceContextual bandit algorithms are essential for solving many real-world interac...
International audienceThis paper presents a new reinforcement learning (RL) algorithm called Bellman...