Structured Strategies for Learning and Exploration in Sequential Decision Making

Wang, Yijia

Publication date

June 2022

Abstract

Solving Markov decision processes (MDPs) efficiently is challenging in many cases, for example, when the state space or action space is large, when the reward function is sparse and delayed, and when there is a distribution of MDPs. Structures in the policy, value function, reward function, or state space can be useful in accelerating the learning process. In this thesis, we exploit structures in MDPs to solve them effectively and efficiently. First, we study problems with concave value function and basestock policy and leverage these two structures to propose an approximate dynamic programming (ADP) algorithm. Next, we study the exploration problem in unknown MDPs, introduce structured intrinsic reward to the problem, and propose a Bayes-o...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Structured Strategies for Learning and Exploration in Sequential Decision Making

Abstract

Extracted data

Structured Strategies for Learning and Exploration in Sequential Decision Making

Abstract

Extracted data

Related items

Related items