Unlike the standard Reinforcement Learning (RL) model, many real-world tasks are non-Markovian, whose rewards are predicated on state history rather than solely on the current state. Solving a non-Markovian task, frequently applied in practical applications such as autonomous driving, financial trading, and medical diagnosis, can be quite challenging. We propose a novel RL approach to achieve non-Markovian rewards expressed in temporal logic LTL$_f$ (Linear Temporal Logic over Finite Traces). To this end, an encoding of linear complexity from LTL$_f$ into MDPs (Markov Decision Processes) is introduced to take advantage of advanced RL algorithms. Then, a prioritized experience replay technique based on the automata structure (semantics equiv...
We consider the problem of steering a system with unknown, stochastic dynamics to satisfy a ri...
We generalise the problem of reward modelling (RM) for reinforcement learning (RL) to handle non-Mar...
Reward engineering is an important aspect of reinforcement learning. Whether or not the users’ inten...
The standard RL world model is that of a Markov Decision Process (MDP). A basic premise of MDPs is t...
The standard RL world model is that of a Markov Decision Process (MDP). A basic premise of MDPs is t...
This paper addresses the problem of learning control policies for mobile robots, modeled as unknown ...
In recent years, researchers have made significant progress in devising reinforcement-learning algor...
In Markov Decision Processes (MDPs), the reward obtained in a state is Markovian, i.e., depends on t...
Reinforcement Learning (RL) is a widely employed machine learning architecture that has been applied...
Linear temporal logic (LTL) and omega-regular objectives -- a superset of LTL -- have seen recent us...
Reinforcement learning (RL) studies the problem where an agent maximizes its cumulative reward throu...
In Markov Decision Processes (MDPs), the reward obtained in a state is Markovian, i.e., depends on t...
Reactive synthesis algorithms allow automatic construction of policies to control an environment mod...
AbstractTechniques based on reinforcement learning (RL) have been used to build systems that learn t...
We present a model-free reinforcement learning algorithm to synthesize control policies that maximiz...
We consider the problem of steering a system with unknown, stochastic dynamics to satisfy a ri...
We generalise the problem of reward modelling (RM) for reinforcement learning (RL) to handle non-Mar...
Reward engineering is an important aspect of reinforcement learning. Whether or not the users’ inten...
The standard RL world model is that of a Markov Decision Process (MDP). A basic premise of MDPs is t...
The standard RL world model is that of a Markov Decision Process (MDP). A basic premise of MDPs is t...
This paper addresses the problem of learning control policies for mobile robots, modeled as unknown ...
In recent years, researchers have made significant progress in devising reinforcement-learning algor...
In Markov Decision Processes (MDPs), the reward obtained in a state is Markovian, i.e., depends on t...
Reinforcement Learning (RL) is a widely employed machine learning architecture that has been applied...
Linear temporal logic (LTL) and omega-regular objectives -- a superset of LTL -- have seen recent us...
Reinforcement learning (RL) studies the problem where an agent maximizes its cumulative reward throu...
In Markov Decision Processes (MDPs), the reward obtained in a state is Markovian, i.e., depends on t...
Reactive synthesis algorithms allow automatic construction of policies to control an environment mod...
AbstractTechniques based on reinforcement learning (RL) have been used to build systems that learn t...
We present a model-free reinforcement learning algorithm to synthesize control policies that maximiz...
We consider the problem of steering a system with unknown, stochastic dynamics to satisfy a ri...
We generalise the problem of reward modelling (RM) for reinforcement learning (RL) to handle non-Mar...
Reward engineering is an important aspect of reinforcement learning. Whether or not the users’ inten...