In Reinforcement Learning (RL), an agent acts in an unknown environment to maximize the expected cumulative discounted sum of an external reward signal, i.e., the expected return. In practice, in many tasks of interest, such as policy optimization, the agent usually spends its interaction budget by collecting episodes of fixed length within a simulator (i.e., Monte Carlo simulation). However, given the discounted nature of the RL objective, this data collection strategy might not be the best option. Indeed, the rewards taken in early simulation steps weigh exponentially more than future rewards. Taking a cue from this intuition, in this paper, we design an a-priori budget allocation strategy that leads to the collection of trajectories of d...
Different from classic Supervised Learning, Reinforcement Learning (RL), is fundamentally interactiv...
This paper investigates to what extent one can improve reinforcement learning algorithms. Our study ...
Model-free reinforcement learning methods such as the Proximal Policy Optimization algorithm (PPO) h...
In Reinforcement Learning (RL), an agent acts in an unknown environment to maximize the expected cum...
In Reinforcement Learning (RL), Monte Carlo methods teach the agents how to make decisions based on ...
peer reviewedWe propose an algorithm for estimating the finite-horizon expected return of a closed l...
This thesis is mostly focused on reinforcement learning, which is viewed as an optimization problem:...
Policy-based reinforcement learning algorithms are widely used in various fields. Among them, mainst...
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative rew...
The most relevant problems in discounted reinforcement learning involve estimating the mean of a fun...
International audienceIn reinforcement learning, an agent collects information interacting with an e...
The quintessential model-based reinforcement-learning agent iteratively refines its estimates or pri...
Agents acting in real-world scenarios often have constraints such as finite budgets or daily job per...
This paper introduces a set of algorithms for Monte-Carlo Bayesian reinforcement learning. Firstly, ...
In this article we study the connection of stochastic optimal control and reinforcement learning. Ou...
Different from classic Supervised Learning, Reinforcement Learning (RL), is fundamentally interactiv...
This paper investigates to what extent one can improve reinforcement learning algorithms. Our study ...
Model-free reinforcement learning methods such as the Proximal Policy Optimization algorithm (PPO) h...
In Reinforcement Learning (RL), an agent acts in an unknown environment to maximize the expected cum...
In Reinforcement Learning (RL), Monte Carlo methods teach the agents how to make decisions based on ...
peer reviewedWe propose an algorithm for estimating the finite-horizon expected return of a closed l...
This thesis is mostly focused on reinforcement learning, which is viewed as an optimization problem:...
Policy-based reinforcement learning algorithms are widely used in various fields. Among them, mainst...
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative rew...
The most relevant problems in discounted reinforcement learning involve estimating the mean of a fun...
International audienceIn reinforcement learning, an agent collects information interacting with an e...
The quintessential model-based reinforcement-learning agent iteratively refines its estimates or pri...
Agents acting in real-world scenarios often have constraints such as finite budgets or daily job per...
This paper introduces a set of algorithms for Monte-Carlo Bayesian reinforcement learning. Firstly, ...
In this article we study the connection of stochastic optimal control and reinforcement learning. Ou...
Different from classic Supervised Learning, Reinforcement Learning (RL), is fundamentally interactiv...
This paper investigates to what extent one can improve reinforcement learning algorithms. Our study ...
Model-free reinforcement learning methods such as the Proximal Policy Optimization algorithm (PPO) h...