The method of value oriented successive approximations for the average reward Markov decision process

Wal, J van der

Publication date

January 1979

Publisher

Technische Hogeschool Eindhoven

Abstract

In this paper we consider the Markov decision process with finite state and action spaces at the criterion of average reward per unit time. We will consider the method of value oriented successive approximations which has been extensively studied by Van Nunen for the total reward case. Under various conditions which guarantee the gain of the process to be independent of the starting state and a strong aperiodicity assumption we show that the method converges and produces e-optimal policies

Extracted data

We use cookies to provide a better user experience.

Data Protection

The method of value oriented successive approximations for the average reward Markov decision process

Abstract

Extracted data

The method of value oriented successive approximations for the average reward Markov decision process

Abstract

Extracted data

Related items

Related items