Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the adaptive control literature for tackling explore-exploit trade-offs. This paper studies the neural contextual bandit problem from a distributional perspective and proposes NeuralRBMLE, which leverages the likelihood of surrogate parametric distributions to learn the unknown reward distributions and thereafter adapts the RBMLE principle to achieve efficient exploration by properly adding a reward-bias term. NeuralRBMLE leverages the representation power of neural networks and directly encodes exploratory behavior in the parameter space, without constructing confidence intervals of the estimated rewards. We propose two variants of NeuralRBMLE algorithms: The fir...
In stochastic contextual bandit (SCB) problems, an agent selects an action based on certain observed...
Recent studies have shown that reinforcement learning (RL) models are vulnerable in various noisy sc...
In traditional Reinforcement Learning (RL), agents learn to optimize actions in a dynamic context ba...
Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the adaptive control l...
Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control li...
Graduate School of Artificial Intelligence ArtificiMulti-armed bandit is a well-formulated test bed ...
Recent works on neural contextual bandit have achieved compelling performances thanks to their abili...
Contextual bandits can solve a huge range of real-world problems. However, current popular algorithm...
International audienceThis paper presents a new contextual bandit algorithm, NeuralBandit, which doe...
Upper confidence bound (UCB) based contextual bandit algorithms require one to know the tail propert...
Existing inverse reinforcement learning (IRL) algorithms have assumed each expert’s demonstrated tra...
We introduce a stochastic contextual bandit model where at each time step the environment chooses a ...
Contextual bandits aim to identify among a set of arms the optimal one with the highest reward based...
Learning action policy for autonomous agents in a decentralized multi-agent environment has remained...
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as ...
In stochastic contextual bandit (SCB) problems, an agent selects an action based on certain observed...
Recent studies have shown that reinforcement learning (RL) models are vulnerable in various noisy sc...
In traditional Reinforcement Learning (RL), agents learn to optimize actions in a dynamic context ba...
Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the adaptive control l...
Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control li...
Graduate School of Artificial Intelligence ArtificiMulti-armed bandit is a well-formulated test bed ...
Recent works on neural contextual bandit have achieved compelling performances thanks to their abili...
Contextual bandits can solve a huge range of real-world problems. However, current popular algorithm...
International audienceThis paper presents a new contextual bandit algorithm, NeuralBandit, which doe...
Upper confidence bound (UCB) based contextual bandit algorithms require one to know the tail propert...
Existing inverse reinforcement learning (IRL) algorithms have assumed each expert’s demonstrated tra...
We introduce a stochastic contextual bandit model where at each time step the environment chooses a ...
Contextual bandits aim to identify among a set of arms the optimal one with the highest reward based...
Learning action policy for autonomous agents in a decentralized multi-agent environment has remained...
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as ...
In stochastic contextual bandit (SCB) problems, an agent selects an action based on certain observed...
Recent studies have shown that reinforcement learning (RL) models are vulnerable in various noisy sc...
In traditional Reinforcement Learning (RL), agents learn to optimize actions in a dynamic context ba...