We study time-inhomogeneous episodic reinforcement learning (RL) under general function approximation and sparse rewards. We design a new algorithm, Variance-weighted Optimistic $Q$-Learning (VO$Q$L), based on $Q$-learning and bound its regret assuming completeness and bounded Eluder dimension for the regression function class. As a special case, VO$Q$L achieves $\tilde{O}(d\sqrt{HT}+d^6H^{5})$ regret over $T$ episodes for a horizon $H$ MDP under ($d$-dimensional) linear function approximation, which is asymptotically optimal. Our algorithm incorporates weighted regression-based upper and lower bounds on the optimal value function to obtain this improved regret. The algorithm is computationally efficient given a regression oracle over the f...
Obtaining first-order regret bounds -- regret bounds scaling not as the worst-case but with some mea...
We study the regret of reinforcement learning from offline data generated by a fixed behavior policy...
Reinforcement Learning (RL) has achieved tremendous empirical successes in real-world decision-makin...
We consider the regret minimization problem in reinforcement learning (RL) in the episodic setting. ...
We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogen...
We study reinforcement learning in an infinite-horizon average-reward setting with linear function a...
Achieving sample efficiency in online episodic reinforcement learning (RL) requires optimally balanc...
With the increasing need for handling large state and action spaces, general function approximation ...
International audienceWe consider an agent interacting with an environment in a single stream of act...
Reinforcement learning (RL) studies the problem where an agent maximizes its cumulative reward throu...
Reward discounting has become an indispensable ingredient in designing practical reinforcement learn...
We study online reinforcement learning for finite-horizon deterministic control systems with arbitra...
Reward discounting has become an indispensable ingredient in designing practical reinforcement learn...
International audienceWe consider an agent interacting with an environment in a single stream of act...
We consider the infinite-horizon linear Markov Decision Processes (MDPs), where the transition proba...
Obtaining first-order regret bounds -- regret bounds scaling not as the worst-case but with some mea...
We study the regret of reinforcement learning from offline data generated by a fixed behavior policy...
Reinforcement Learning (RL) has achieved tremendous empirical successes in real-world decision-makin...
We consider the regret minimization problem in reinforcement learning (RL) in the episodic setting. ...
We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogen...
We study reinforcement learning in an infinite-horizon average-reward setting with linear function a...
Achieving sample efficiency in online episodic reinforcement learning (RL) requires optimally balanc...
With the increasing need for handling large state and action spaces, general function approximation ...
International audienceWe consider an agent interacting with an environment in a single stream of act...
Reinforcement learning (RL) studies the problem where an agent maximizes its cumulative reward throu...
Reward discounting has become an indispensable ingredient in designing practical reinforcement learn...
We study online reinforcement learning for finite-horizon deterministic control systems with arbitra...
Reward discounting has become an indispensable ingredient in designing practical reinforcement learn...
International audienceWe consider an agent interacting with an environment in a single stream of act...
We consider the infinite-horizon linear Markov Decision Processes (MDPs), where the transition proba...
Obtaining first-order regret bounds -- regret bounds scaling not as the worst-case but with some mea...
We study the regret of reinforcement learning from offline data generated by a fixed behavior policy...
Reinforcement Learning (RL) has achieved tremendous empirical successes in real-world decision-makin...