The standard feedback model of reinforcement learning requires revealing the reward of every visited state-action pair. However, in practice, it is often the case that such frequent feedback is not available. In this work, we take a first step towards relaxing this assumption and require a weaker form of feedback, which we refer to as \emph{trajectory feedback}. Instead of observing the reward obtained after every action, we assume we only receive a score that represents the quality of the whole trajectory observed by the agent, namely, the sum of all rewards obtained over this trajectory. We extend reinforcement learning algorithms to this setting, based on least-squares estimation of the unknown reward, for both the known and unknown tran...
In performative prediction, the deployment of a predictive model triggers a shift in the data distri...
Existing inverse reinforcement learning (IRL) algorithms have assumed each ex-pert’s demonstrated tr...
This thesis investigates sequential decision making tasks that fall in the framework of reinforcemen...
The remarkable success of reinforcement learning (RL) heavily relies on observing the reward of ever...
We generalise the problem of reward modelling (RM) for reinforcement learning (RL) to handle non-Mar...
We generalise the problem of reward modelling (RM) for reinforcement learning (RL) to handle non-Mar...
Applying conventional reinforcement to complex domains requires the use of an overly simplified task...
The field of Reinforcement Learning is concerned with teaching agents to take optimal decisions t...
We study online reinforcement learning for finite-horizon deterministic control systems with arbitra...
We consider a learning problem where the decision maker interacts with a standard Markov decision pr...
We consider the problem of minimizing the long term average expected regret of an agent in an online...
This article addresses a particular Transfer Reinforcement Learning (RL) problem: when dynamics do n...
Reinforcement learning involves the study of how to solve sequential decision-making problems using ...
Abstract. We present a Reinforcement Learning (RL) algorithm based on policy iteration for solving a...
One of the most challenging types of environments for a Deep Reinforcement Learning agent to learn i...
In performative prediction, the deployment of a predictive model triggers a shift in the data distri...
Existing inverse reinforcement learning (IRL) algorithms have assumed each ex-pert’s demonstrated tr...
This thesis investigates sequential decision making tasks that fall in the framework of reinforcemen...
The remarkable success of reinforcement learning (RL) heavily relies on observing the reward of ever...
We generalise the problem of reward modelling (RM) for reinforcement learning (RL) to handle non-Mar...
We generalise the problem of reward modelling (RM) for reinforcement learning (RL) to handle non-Mar...
Applying conventional reinforcement to complex domains requires the use of an overly simplified task...
The field of Reinforcement Learning is concerned with teaching agents to take optimal decisions t...
We study online reinforcement learning for finite-horizon deterministic control systems with arbitra...
We consider a learning problem where the decision maker interacts with a standard Markov decision pr...
We consider the problem of minimizing the long term average expected regret of an agent in an online...
This article addresses a particular Transfer Reinforcement Learning (RL) problem: when dynamics do n...
Reinforcement learning involves the study of how to solve sequential decision-making problems using ...
Abstract. We present a Reinforcement Learning (RL) algorithm based on policy iteration for solving a...
One of the most challenging types of environments for a Deep Reinforcement Learning agent to learn i...
In performative prediction, the deployment of a predictive model triggers a shift in the data distri...
Existing inverse reinforcement learning (IRL) algorithms have assumed each ex-pert’s demonstrated tr...
This thesis investigates sequential decision making tasks that fall in the framework of reinforcemen...