The curse of dimensionality is a widely known issue in reinforcement learning (RL). In the tabular setting where the state space $\mathcal{S}$ and the action space $\mathcal{A}$ are both finite, to obtain a nearly optimal policy with sampling access to a generative model, the minimax optimal sample complexity scales linearly with $|\mathcal{S}|\times|\mathcal{A}|$, which can be prohibitively large when $\mathcal{S}$ or $\mathcal{A}$ is large. This paper considers a Markov decision process (MDP) that admits a set of state-action features, which can linearly express (or approximate) its probability transition kernel. We show that a model-based approach (resp.$~$Q-learning) provably learns an $\varepsilon$-optimal policy (resp.$~$Q-function) w...
We consider the problem of model-free reinforcement learning in the Markovian decision processes (MD...
The Robust Markov Decision Process (RMDP) framework focuses on designing control policies that are r...
We study model-based reinforcement learning (RL) for episodic Markov decision processes (MDP) whose ...
International audienceWe consider the problem of learning the optimal action-value function in disco...
International audienceWe consider the problem of learning the optimal action-value function in the d...
With the increasing need for handling large state and action spaces, general function approximation ...
We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogen...
Reinforcement learning (RL) is a machine learning paradigm where an agent learns to interact with an...
Abstract We consider the problem of learning the optimal action-value func-tion in discounted-reward...
Three major challenges in reinforcement learning are the complex dynamical systems with large state ...
Reinforcement learning with function approximation has recently achieved tremendous results in appli...
Reward-free reinforcement learning (RL) considers the setting where the agent does not have access t...
Achieving sample efficiency in online episodic reinforcement learning (RL) requires optimally balanc...
We present a new algorithm for general reinforcement learning where the true environment is known ...
Many physical systems have underlying safety considerations that require that the policy employed en...
We consider the problem of model-free reinforcement learning in the Markovian decision processes (MD...
The Robust Markov Decision Process (RMDP) framework focuses on designing control policies that are r...
We study model-based reinforcement learning (RL) for episodic Markov decision processes (MDP) whose ...
International audienceWe consider the problem of learning the optimal action-value function in disco...
International audienceWe consider the problem of learning the optimal action-value function in the d...
With the increasing need for handling large state and action spaces, general function approximation ...
We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogen...
Reinforcement learning (RL) is a machine learning paradigm where an agent learns to interact with an...
Abstract We consider the problem of learning the optimal action-value func-tion in discounted-reward...
Three major challenges in reinforcement learning are the complex dynamical systems with large state ...
Reinforcement learning with function approximation has recently achieved tremendous results in appli...
Reward-free reinforcement learning (RL) considers the setting where the agent does not have access t...
Achieving sample efficiency in online episodic reinforcement learning (RL) requires optimally balanc...
We present a new algorithm for general reinforcement learning where the true environment is known ...
Many physical systems have underlying safety considerations that require that the policy employed en...
We consider the problem of model-free reinforcement learning in the Markovian decision processes (MD...
The Robust Markov Decision Process (RMDP) framework focuses on designing control policies that are r...
We study model-based reinforcement learning (RL) for episodic Markov decision processes (MDP) whose ...