Reward-Biased Maximum Likelihood Estimation for Neural Contextual Bandits

Hung, Yu-Heng
Hsieh, Ping-Chun

Publication date

May 2022

Abstract

Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the adaptive control literature for tackling explore-exploit trade-offs. This paper studies the stochastic contextual bandit problem with general bounded reward functions and proposes NeuralRBMLE, which adapts the RBMLE principle by adding a bias term to the log-likelihood to enforce exploration. NeuralRBMLE leverages the representation power of neural networks and directly encodes exploratory behavior in the parameter space, without constructing confidence intervals of the estimated rewards. We propose two variants of NeuralRBMLE algorithms: The first variant directly obtains the RBMLE estimator by gradient ascent, and the second variant simplifies RBMLE to a sim...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Reward-Biased Maximum Likelihood Estimation for Neural Contextual Bandits

Abstract

Extracted data

Reward-Biased Maximum Likelihood Estimation for Neural Contextual Bandits

Abstract

Extracted data

Related items

Related items