We analyze the use of simultaneous perturbation stochastic approximation (SPSA), a stochastic optimization technique, for solving reinforcement learning problems. In particular, we consider settings of partial observability and leverage the short-term memory capabilities of echo state networks (ESNs) to learn parameterized control policies. Using SPSA, we propose three different variants to adapt the weight matrices of an ESN to the task at hand. Experimental results on classic control problems with both discrete and continuous action spaces reveal that ESNs trained using SPSA approaches outperform conventional ESNs trained using temporal difference and policy gradient methods
We present an in-depth survey of policy gradient methods as they are used in the machine learning co...
We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and determ...
We present an algorithm for policy search in stochastic dynamical systems using model-based reinforc...
Concerned with neural learning without backpropagation, we investigate variants of the simultaneous ...
The problem of synthesizing stochastic explicit model predictive control policies is known to be qui...
In this work, we address the problem of learning provably stable neural network policies for stochas...
The problem of making optimal decisions in uncertain conditions is central to Artificial Intelligenc...
the robust neural controller based on the SPSA has been developed to obtain the guaranteed stability...
peer reviewedWe propose novel policy search algorithms in the context of off-policy, batch mode rein...
Gradient-based approaches to direct policy search in reinforcement learning have received much recen...
Gradient-based approaches to direct policy search in reinforcement learning have received much rece...
Reinforcement learning for partially observable Markov decision problems (POMDPs) is a challenge as ...
Trajectory-Centric Reinforcement Learning and Trajectory Optimization methods optimize a sequence of...
Recent advancements in deep reinforcement learning for real control tasks have received interest fro...
We are interested in understanding stability (almost sure boundedness) of stochastic approximation a...
We present an in-depth survey of policy gradient methods as they are used in the machine learning co...
We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and determ...
We present an algorithm for policy search in stochastic dynamical systems using model-based reinforc...
Concerned with neural learning without backpropagation, we investigate variants of the simultaneous ...
The problem of synthesizing stochastic explicit model predictive control policies is known to be qui...
In this work, we address the problem of learning provably stable neural network policies for stochas...
The problem of making optimal decisions in uncertain conditions is central to Artificial Intelligenc...
the robust neural controller based on the SPSA has been developed to obtain the guaranteed stability...
peer reviewedWe propose novel policy search algorithms in the context of off-policy, batch mode rein...
Gradient-based approaches to direct policy search in reinforcement learning have received much recen...
Gradient-based approaches to direct policy search in reinforcement learning have received much rece...
Reinforcement learning for partially observable Markov decision problems (POMDPs) is a challenge as ...
Trajectory-Centric Reinforcement Learning and Trajectory Optimization methods optimize a sequence of...
Recent advancements in deep reinforcement learning for real control tasks have received interest fro...
We are interested in understanding stability (almost sure boundedness) of stochastic approximation a...
We present an in-depth survey of policy gradient methods as they are used in the machine learning co...
We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and determ...
We present an algorithm for policy search in stochastic dynamical systems using model-based reinforc...