Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the adaptive control literature for tackling explore-exploit trade-offs. This paper studies the stochastic contextual bandit problem with general bounded reward functions and proposes NeuralRBMLE, which adapts the RBMLE principle by adding a bias term to the log-likelihood to enforce exploration. NeuralRBMLE leverages the representation power of neural networks and directly encodes exploratory behavior in the parameter space, without constructing confidence intervals of the estimated rewards. We propose two variants of NeuralRBMLE algorithms: The first variant directly obtains the RBMLE estimator by gradient ascent, and the second variant simplifies RBMLE to a sim...
In stochastic contextual bandit (SCB) problems, an agent selects an action based on certain observed...
Contextual bandits aim to identify among a set of arms the optimal one with the highest reward based...
The statistical framework of Generalized Linear Models (GLM) can be applied to sequential problems i...
Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the adaptive control l...
Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control li...
Graduate School of Artificial Intelligence ArtificiMulti-armed bandit is a well-formulated test bed ...
Recent works on neural contextual bandit have achieved compelling performances thanks to their abili...
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as ...
Contextual bandits can solve a huge range of real-world problems. However, current popular algorithm...
International audienceThis paper presents a new contextual bandit algorithm, NeuralBandit, which doe...
We introduce a stochastic contextual bandit model where at each time step the environment chooses a ...
Contextual multi-armed bandit (MAB) algorithms have been shown promising for maximizing cumulative r...
Upper confidence bound (UCB) based contextual bandit algorithms require one to know the tail propert...
The bandit problem models a sequential decision process between a player and an environment. In the ...
We study two randomized algorithms for generalized linear bandits, GLM-TSL and GLM-FPL. GLM-TSL samp...
In stochastic contextual bandit (SCB) problems, an agent selects an action based on certain observed...
Contextual bandits aim to identify among a set of arms the optimal one with the highest reward based...
The statistical framework of Generalized Linear Models (GLM) can be applied to sequential problems i...
Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the adaptive control l...
Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control li...
Graduate School of Artificial Intelligence ArtificiMulti-armed bandit is a well-formulated test bed ...
Recent works on neural contextual bandit have achieved compelling performances thanks to their abili...
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as ...
Contextual bandits can solve a huge range of real-world problems. However, current popular algorithm...
International audienceThis paper presents a new contextual bandit algorithm, NeuralBandit, which doe...
We introduce a stochastic contextual bandit model where at each time step the environment chooses a ...
Contextual multi-armed bandit (MAB) algorithms have been shown promising for maximizing cumulative r...
Upper confidence bound (UCB) based contextual bandit algorithms require one to know the tail propert...
The bandit problem models a sequential decision process between a player and an environment. In the ...
We study two randomized algorithms for generalized linear bandits, GLM-TSL and GLM-FPL. GLM-TSL samp...
In stochastic contextual bandit (SCB) problems, an agent selects an action based on certain observed...
Contextual bandits aim to identify among a set of arms the optimal one with the highest reward based...
The statistical framework of Generalized Linear Models (GLM) can be applied to sequential problems i...