Despite the significant interest and progress in reinforcement learning (RL) problems with adversarial corruption, current works are either confined to the linear setting or lead to an undesired $\tilde{O}(\sqrt{T}\zeta)$ regret bound, where $T$ is the number of rounds and $\zeta$ is the total amount of corruption. In this paper, we consider the contextual bandit with general function approximation and propose a computationally efficient algorithm to achieve a regret of $\tilde{O}(\sqrt{T}+\zeta)$. The proposed algorithm relies on the recently developed uncertainty-weighted least-squares regression from linear contextual bandit \citep{he2022nearly} and a new weighted estimator of uncertainty for the general function class. In contrast to th...
This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BO...
We propose a novel algorithm for generalized linear contextual bandits (GLBs) with a regret bound su...
Multi-armed bandits (MAB) problem is a basic setting for sequential decision-making problemswith par...
In this paper, we study the stochastic bandits problem with k unknown heavy-tailed and corrupted rew...
We investigate the problem of corruption robustness in offline reinforcement learning (RL) with gene...
We consider an adversarial variant of the classic $K$-armed linear contextual bandit problem where t...
We study online reinforcement learning in linear Markov decision processes with adversarial losses a...
Motivated by models of human decision making proposed to explain commonly observed deviations from c...
Motivated by models of human decision making proposed to explain commonly observed deviations from c...
We consider the problem of optimizing an unknown (typically non-convex) function with a bounded norm...
We initiate the study of multi-stage episodic reinforcement learning under adversarial corruptions i...
Upper confidence bound (UCB) based contextual bandit algorithms require one to know the tail propert...
Contextual bandits are canonical models for sequential decision-making under uncertainty in environm...
50 pages; 4 figuresWe investigate the regret-minimisation problem in a multi-armed bandit setting wi...
We study learning algorithms for the classical Markovian bandit problem with discount. We explain ho...
This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BO...
We propose a novel algorithm for generalized linear contextual bandits (GLBs) with a regret bound su...
Multi-armed bandits (MAB) problem is a basic setting for sequential decision-making problemswith par...
In this paper, we study the stochastic bandits problem with k unknown heavy-tailed and corrupted rew...
We investigate the problem of corruption robustness in offline reinforcement learning (RL) with gene...
We consider an adversarial variant of the classic $K$-armed linear contextual bandit problem where t...
We study online reinforcement learning in linear Markov decision processes with adversarial losses a...
Motivated by models of human decision making proposed to explain commonly observed deviations from c...
Motivated by models of human decision making proposed to explain commonly observed deviations from c...
We consider the problem of optimizing an unknown (typically non-convex) function with a bounded norm...
We initiate the study of multi-stage episodic reinforcement learning under adversarial corruptions i...
Upper confidence bound (UCB) based contextual bandit algorithms require one to know the tail propert...
Contextual bandits are canonical models for sequential decision-making under uncertainty in environm...
50 pages; 4 figuresWe investigate the regret-minimisation problem in a multi-armed bandit setting wi...
We study learning algorithms for the classical Markovian bandit problem with discount. We explain ho...
This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BO...
We propose a novel algorithm for generalized linear contextual bandits (GLBs) with a regret bound su...
Multi-armed bandits (MAB) problem is a basic setting for sequential decision-making problemswith par...