Approximate value iteration (AVI) is a widely used technique in reinforcement learning. Most AVI methods do not take full advantage of the sequential relationship between samples within a trajectory in deriving value estimates, due to the challenges in dealing with the inherent bias and variance in the $n$-step returns. We propose a bounding method which uses a negatively biased but relatively low variance estimator generated from a complex return to provide a lower bound on the observed value of a traditional one-step return estimator. In addition, we develop a new Bounded FQI algorithm, which efficiently incorporates the bounding method into an AVI framework. Experiments show that our method produces more accurate value estimates than e...
There are two classes of average reward reinforcement learning (RL) algorithms: model-based ones tha...
We address the problem of non-convergence of online reinforcement learning algorithms (e.g., Q learn...
Abstract — We consider batch reinforcement learning problems in continuous space, expected total dis...
International audienceThis paper is about the study of B-FQI, an Approximated Value Iteration (AVI) ...
Fitted Q-Iteration (FQI) is a popular approximate value it-eration (AVI) approach that makes effecti...
Abstract. Approximate value iteration methods for reinforcement learn-ing (RL) generalize experience...
International audienceThis paper is about the study of B-FQI, an Approximated Value Iteration (AVI) ...
Approximate Value Iteration (AVI) is a method for solving large Markov De ision Problems by approxim...
Approximate Value Iteration (AVI) is a method for solving large Markov De ision Problems by approxim...
Abstract—Tackling large approximate dynamic programming or reinforcement learning problems requires ...
Abstract. Many reinforcement learning approaches can be formulated using the theory of Markov decisi...
We consider the use of two additive control variate methods to reduce the variance of performance gr...
Temporally extended actions have proven useful for reinforcement learning, but their duration also m...
International audienceApproximate Value Iteration (AVI) is a method for solving large Markov Decisio...
Temporally extended actions have proven useful for reinforcement learning, but their duration also m...
There are two classes of average reward reinforcement learning (RL) algorithms: model-based ones tha...
We address the problem of non-convergence of online reinforcement learning algorithms (e.g., Q learn...
Abstract — We consider batch reinforcement learning problems in continuous space, expected total dis...
International audienceThis paper is about the study of B-FQI, an Approximated Value Iteration (AVI) ...
Fitted Q-Iteration (FQI) is a popular approximate value it-eration (AVI) approach that makes effecti...
Abstract. Approximate value iteration methods for reinforcement learn-ing (RL) generalize experience...
International audienceThis paper is about the study of B-FQI, an Approximated Value Iteration (AVI) ...
Approximate Value Iteration (AVI) is a method for solving large Markov De ision Problems by approxim...
Approximate Value Iteration (AVI) is a method for solving large Markov De ision Problems by approxim...
Abstract—Tackling large approximate dynamic programming or reinforcement learning problems requires ...
Abstract. Many reinforcement learning approaches can be formulated using the theory of Markov decisi...
We consider the use of two additive control variate methods to reduce the variance of performance gr...
Temporally extended actions have proven useful for reinforcement learning, but their duration also m...
International audienceApproximate Value Iteration (AVI) is a method for solving large Markov Decisio...
Temporally extended actions have proven useful for reinforcement learning, but their duration also m...
There are two classes of average reward reinforcement learning (RL) algorithms: model-based ones tha...
We address the problem of non-convergence of online reinforcement learning algorithms (e.g., Q learn...
Abstract — We consider batch reinforcement learning problems in continuous space, expected total dis...