Convex Q Learning in a Stochastic Environment: Extended Version

Lu, Fan
Meyn, Sean

Publication date

September 2023

Language

English

Abstract

The paper introduces the first formulation of convex Q-learning for Markov decision processes with function approximation. The algorithms and theory rest on a relaxation of a dual of Manne's celebrated linear programming characterization of optimal control. The main contributions firstly concern properties of the relaxation, described as a deterministic convex program: we identify conditions for a bounded solution, and a significant relationship between the solution to the new convex program, and the solution to standard Q-learning. The second set of contributions concern algorithm design and analysis: (i) A direct model-free method for approximating the convex program for Q-learning shares properties with its ideal. In particular, a bounde...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Convex Q Learning in a Stochastic Environment: Extended Version

Abstract

Extracted data

Convex Q Learning in a Stochastic Environment: Extended Version

Abstract

Extracted data

Topics

Related items

Topics

Related items