A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials

Konstantinos Chatzilygeroudis
Vassilis Vassiliades
Freek Stulp
Sylvain Calinon
Jean-Baptiste Mouret

Publication date

April 2020

DOI

10.1109/TRO.2019.2958211

Abstract

Most policy search algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word “big-data”, we refer to this challenge as “microdata reinforcement learning”. We show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e....

Extracted data

We use cookies to provide a better user experience.

Data Protection

A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials

Abstract

Extracted data

A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials

Abstract

Extracted data

Related items

Related items