Reward optimization in fully observable Markov decision processes is equivalent to a linear program over the polytope of state-action frequencies. Taking a similar perspective in the case of partially observable Markov decision processes with memoryless stochastic policies, the problem was recently formulated as the optimization of a linear objective subject to polynomial constraints. Based on this we present an approach for Reward Optimization in State-Action space (ROSA). We test this approach experimentally in maze navigation tasks. We find that ROSA is computationally efficient and can yield stability improvements over other existing methods.Comment: Accepted as an extended abstract at RLDM 2022, 5 pages, 2 figure
We study countably infinite Markov decision processes (MDPs) with real-valuedtransition rewards. Eve...
In a partially observable Markov decision process (POMDP), if the reward can be observed at each ste...
The Partially Observable Markov Decision Process (POMDP) framework has proven useful in planning dom...
Constrained partially observable Markov decision processes (CPOMDPs) have been used to model various...
It was recently shown that computing an optimal stochastic controller in a discounted in finite-hor...
Planning plays an important role in the broad class of decision theory. Planning has drawn much atte...
Infinite-horizon non-stationary Markov decision processes provide a general framework to model many ...
Sequential decision making is a fundamental task faced by any intelligent agent in an extended inter...
We consider Markov decision processes (MDPs) with multiple limit-average (ormean-payoff) objectives....
Markov Decision Processes (MDPs) and Partially Observable MDPs (POMDPs) have been proposed as a fram...
International audienceMarkovian systems are widely used in reinforcement learning (RL), when the suc...
In a partially observable Markov decision process (POMDP), if the reward can be observed at each ste...
Gradient-based approaches to direct policy search in reinforcement learning have received much recen...
We present an algorithm called Optimistic Linear Programming (OLP) for learning to optimize average ...
AbstractWe consider an approximation scheme for solving Markov decision processes (MDPs) with counta...
We study countably infinite Markov decision processes (MDPs) with real-valuedtransition rewards. Eve...
In a partially observable Markov decision process (POMDP), if the reward can be observed at each ste...
The Partially Observable Markov Decision Process (POMDP) framework has proven useful in planning dom...
Constrained partially observable Markov decision processes (CPOMDPs) have been used to model various...
It was recently shown that computing an optimal stochastic controller in a discounted in finite-hor...
Planning plays an important role in the broad class of decision theory. Planning has drawn much atte...
Infinite-horizon non-stationary Markov decision processes provide a general framework to model many ...
Sequential decision making is a fundamental task faced by any intelligent agent in an extended inter...
We consider Markov decision processes (MDPs) with multiple limit-average (ormean-payoff) objectives....
Markov Decision Processes (MDPs) and Partially Observable MDPs (POMDPs) have been proposed as a fram...
International audienceMarkovian systems are widely used in reinforcement learning (RL), when the suc...
In a partially observable Markov decision process (POMDP), if the reward can be observed at each ste...
Gradient-based approaches to direct policy search in reinforcement learning have received much recen...
We present an algorithm called Optimistic Linear Programming (OLP) for learning to optimize average ...
AbstractWe consider an approximation scheme for solving Markov decision processes (MDPs) with counta...
We study countably infinite Markov decision processes (MDPs) with real-valuedtransition rewards. Eve...
In a partially observable Markov decision process (POMDP), if the reward can be observed at each ste...
The Partially Observable Markov Decision Process (POMDP) framework has proven useful in planning dom...