Inspired by recent results on polynomial time reinforcement algorithms that accumulate near-optimal rewards, we look at the related problem of quickly learning near-optimal policies. The new problem is obviously related to the previous one, but different in important ways. We provide simple algorithms for MDPs, zero-sum and common-payoff Stochastic Games, and a uniform framework for proving their polynomial com-plexity. Unlike the previously studied problem, these bounds use the minimum between the mixing time and a new quantity- the spectral radius. Unlike the previous results, our results apply uniformly to the average and discounted cases.
Solving multi-agent reinforcement learning problems has proven difficult because of the lack of trac...
We present new results on the efficiency of no-regret al-gorithms in the context of multiagent learn...
In this work we offer an (Formula presented.) pseudo-polynomial time deterministic algorithm for sol...
We present new algorithms for reinforcement learning, and prove that they have polynomial bounds on ...
AbstractWe present a new algorithm for polynomial time learning of optimal behavior in single-contro...
We present a new algorithm for polynomial time learning of optimal behavior in stochastic games. Thi...
This paper gives the first polynomial-time algorithm for tabular Markov Decision Processes (MDP) tha...
Reinforcement learning with function approximation has recently achieved tremendous results in appli...
Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suer ...
An Unobservable MDP (UMDP) is a POMDP in which there are no observations. An Only-Costly-Observable ...
Preprint arXiv:1310.4953Recent results of Ye and Hansen, Miltersen and Zwick show that policy iterat...
We consider batch reinforcement learning problems in continuous space, expected total discounted-rew...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
We present a novel and uniform formulation of the problem of reinforcement learning against bounded ...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
Solving multi-agent reinforcement learning problems has proven difficult because of the lack of trac...
We present new results on the efficiency of no-regret al-gorithms in the context of multiagent learn...
In this work we offer an (Formula presented.) pseudo-polynomial time deterministic algorithm for sol...
We present new algorithms for reinforcement learning, and prove that they have polynomial bounds on ...
AbstractWe present a new algorithm for polynomial time learning of optimal behavior in single-contro...
We present a new algorithm for polynomial time learning of optimal behavior in stochastic games. Thi...
This paper gives the first polynomial-time algorithm for tabular Markov Decision Processes (MDP) tha...
Reinforcement learning with function approximation has recently achieved tremendous results in appli...
Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suer ...
An Unobservable MDP (UMDP) is a POMDP in which there are no observations. An Only-Costly-Observable ...
Preprint arXiv:1310.4953Recent results of Ye and Hansen, Miltersen and Zwick show that policy iterat...
We consider batch reinforcement learning problems in continuous space, expected total discounted-rew...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
We present a novel and uniform formulation of the problem of reinforcement learning against bounded ...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
Solving multi-agent reinforcement learning problems has proven difficult because of the lack of trac...
We present new results on the efficiency of no-regret al-gorithms in the context of multiagent learn...
In this work we offer an (Formula presented.) pseudo-polynomial time deterministic algorithm for sol...