Optimal policies for controlled Markov chains with a constraint

Beutler, Frederick J.
Ross, Keith W.

Open link

Publication date

November 1985

DOI

10.1016/0022-247X(85)90288-4

Publisher

Published by Elsevier Inc.

Abstract

AbstractThe time average reward for a discrete-time controlled Markov process subject to a time-average cost constraint is maximized over the class of al causal policies. Each epoch, a reward depending on the state and action, is earned, and a similarly constituted cost is assessed; the time average of the former is maximized, subject to a hard limit on the time average of the latter. It is assumed that the state space is finite, and the action space compact metric. An accessibility hypothesis makes it possible to utilize a Lagrange multiplier formulation involving the dynamic programming equation, thus reducing the optimization problem to an unconstrained optimization parametrized by the multiplier. The parametrized dynamic programming equ...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Optimal policies for controlled Markov chains with a constraint

Abstract

Extracted data

Optimal policies for controlled Markov chains with a constraint

Abstract

Extracted data

Related items

Related items