Performance of state-of-the art offline and model-based reinforcement learning (RL) algorithms deteriorates significantly when subjected to severe data scarcity and the presence of heterogeneous agents. In this work, we propose a model-based offline RL method to approach this setting. Using all available data from the various agents, we construct personalized simulators for each individual agent, which are then used to train RL policies. We do so by modeling the transition dynamics of the agents as a low rank tensor decomposition of latent factors associated with agents, states, and actions. We perform experiments on various benchmark environments and demonstrate improvement over existing offline approaches in the scarce data regime.M.Eng
Offline reinforcement learning (RL) struggles in environments with rich and noisy inputs, where the ...
In this dissertation we develop new methodologies and frameworks to address challenges in offline re...
Offline reinforcement learning could learn effective policies from a fixed dataset, which is promisi...
Conventional reinforcement learning (RL) needs an environment to collect fresh data, which is imprac...
Offline reinforcement learning (RL) have received rising interest due to its appealing data efficien...
International audienceOffline Reinforcement Learning (RL) aims to turn large datasets into powerful ...
We present a model-based offline reinforcement learning policy performance lower bound that explicit...
Humans can develop their internal model of the external world and use it for decision making. Reinfo...
Offline Reinforcement Learning (RL) aims at learning an optimal control from a fixed dataset, withou...
Training Reinforcement Learning (RL) agents in high-stakes applications might be too prohibitive due...
We propose a new model-based offline RL framework, called Adversarial Models for Offline Reinforceme...
Offline reinforcement learning (RL) extends the paradigm of classical RL algorithms to purely learni...
206 pagesRecent advances in reinforcement learning (RL) provide exciting potential for making agents...
Offline reinforcement learning leverages previously-collected offline datasets to learn optimal poli...
Offline reinforcement learning (offline RL) considers problems where learning is performed using onl...
Offline reinforcement learning (RL) struggles in environments with rich and noisy inputs, where the ...
In this dissertation we develop new methodologies and frameworks to address challenges in offline re...
Offline reinforcement learning could learn effective policies from a fixed dataset, which is promisi...
Conventional reinforcement learning (RL) needs an environment to collect fresh data, which is imprac...
Offline reinforcement learning (RL) have received rising interest due to its appealing data efficien...
International audienceOffline Reinforcement Learning (RL) aims to turn large datasets into powerful ...
We present a model-based offline reinforcement learning policy performance lower bound that explicit...
Humans can develop their internal model of the external world and use it for decision making. Reinfo...
Offline Reinforcement Learning (RL) aims at learning an optimal control from a fixed dataset, withou...
Training Reinforcement Learning (RL) agents in high-stakes applications might be too prohibitive due...
We propose a new model-based offline RL framework, called Adversarial Models for Offline Reinforceme...
Offline reinforcement learning (RL) extends the paradigm of classical RL algorithms to purely learni...
206 pagesRecent advances in reinforcement learning (RL) provide exciting potential for making agents...
Offline reinforcement learning leverages previously-collected offline datasets to learn optimal poli...
Offline reinforcement learning (offline RL) considers problems where learning is performed using onl...
Offline reinforcement learning (RL) struggles in environments with rich and noisy inputs, where the ...
In this dissertation we develop new methodologies and frameworks to address challenges in offline re...
Offline reinforcement learning could learn effective policies from a fixed dataset, which is promisi...