Truncating Trajectories in Monte Carlo Reinforcement Learning

Poiani Riccardo
Metelli Alberto Maria
Restelli Marcello

Open PDF

Open link

Publication date

January 2023

Publisher

PMLR

Language

English

Abstract

In Reinforcement Learning (RL), an agent acts in an unknown environment to maximize the expected cumulative discounted sum of an external reward signal, i.e., the expected return. In practice, in many tasks of interest, such as policy optimization, the agent usually spends its interaction budget by collecting episodes of fixed length within a simulator (i.e., Monte Carlo simulation). However, given the discounted nature of the RL objective, this data collection strategy might not be the best option. Indeed, the rewards taken in early simulation steps weigh exponentially more than future rewards. Taking a cue from this intuition, in this paper, we design an a-priori budget allocation strategy that leads to the collection of trajectories of d...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Truncating Trajectories in Monte Carlo Reinforcement Learning

Abstract

Extracted data

Truncating Trajectories in Monte Carlo Reinforcement Learning

Abstract

Extracted data

Related items

Related items