In this thesis, the global convergence of model-based and model-free gradient descent and natural policy gradient descent algorithms is studied for a class of linear quadratic deep structured teams. In such systems, agents are partitioned into a few sub populations wherein the agents in each sub-population are coupled in the dynamics and cost function through a set of linear regressions of the states and actions of all agents. Every agent observes its local state and the linear regressions of states, called deep states. For a sufficiently small risk factor and/or sufficiently large population, we prove that model-based policy gradient methods globally converge to the optimal solution. Given an arbitrary number of agents, we develop model-f...
A feedforward network composed of units of teams of parameterized learning automata is considered as...
Policy gradient methods are a type of reinforcement learning techniques that rely upon optimizing pa...
Deep deterministic policy gradient algorithm operating over continuous space of actions has attracte...
The linear quadratic framework is widely studied in the literature on stochastic control and game th...
We explore reinforcement learning methods for finding the optimal policy in the linear quadratic reg...
The linear quadratic regulator (LQR) problem has reemerged as an important theoretical benchmark for...
International audienceOptimal control of nonlinear systems is a difficult problem which has been add...
This paper deals with distributed reinforcement learning problems with safety constraints. In partic...
In this paper, we will deal with a linear quadratic optimal control problem with unknown dynamics. A...
In this paper, we linear quadratic team decision problems, where a team of agents minimizes a convex...
Reinforcement learning algorithms such as the deep deterministic policy gradient algorithm (DDPG) ha...
Thesis (Ph.D.)--University of Washington, 2020In this thesis, we shall study optimal control problem...
A policy gradient method is a reinforcement learning approach that directly optimizes a parametrized...
Reinforcement learning has recently become a promising area of machine learning with significant ach...
Deep reinforcement learning makes it possible to train control policies that map high-dimensional ob...
A feedforward network composed of units of teams of parameterized learning automata is considered as...
Policy gradient methods are a type of reinforcement learning techniques that rely upon optimizing pa...
Deep deterministic policy gradient algorithm operating over continuous space of actions has attracte...
The linear quadratic framework is widely studied in the literature on stochastic control and game th...
We explore reinforcement learning methods for finding the optimal policy in the linear quadratic reg...
The linear quadratic regulator (LQR) problem has reemerged as an important theoretical benchmark for...
International audienceOptimal control of nonlinear systems is a difficult problem which has been add...
This paper deals with distributed reinforcement learning problems with safety constraints. In partic...
In this paper, we will deal with a linear quadratic optimal control problem with unknown dynamics. A...
In this paper, we linear quadratic team decision problems, where a team of agents minimizes a convex...
Reinforcement learning algorithms such as the deep deterministic policy gradient algorithm (DDPG) ha...
Thesis (Ph.D.)--University of Washington, 2020In this thesis, we shall study optimal control problem...
A policy gradient method is a reinforcement learning approach that directly optimizes a parametrized...
Reinforcement learning has recently become a promising area of machine learning with significant ach...
Deep reinforcement learning makes it possible to train control policies that map high-dimensional ob...
A feedforward network composed of units of teams of parameterized learning automata is considered as...
Policy gradient methods are a type of reinforcement learning techniques that rely upon optimizing pa...
Deep deterministic policy gradient algorithm operating over continuous space of actions has attracte...