The main contribution of this work is a novel machine reinforcement learning algorithm for problems where a Poissonian stochastic time delay is present in the agent's reinforcement signal. Despite the presence of the reinforcement noise, the algorithm can craft a suitable control policy for the agent's environment. The novel approach can deal with reinforcements which may be received out of order in time or may even overlap, which was not previously considered in the literature. The proposed algorithm is simulated and its performance is compared to a standard Q-learning algorithm. Through simulation, the proposed method is found to improve the performance of a learning agent in an environment with Poissonian-type stochastically delayed rewa...
A class of nonlinear learning algorithms for the Q-and S-model stochastic automaton-random environme...
We present a distributed variant of Q-learning that allows to learn the optimal cost-to-go function ...
We consider the multi-armed restless bandit problem (RMABP) with an infinite horizon average cost ob...
The main contribution of this work is a novel learning algorithm for machine reinforcement learning ...
This paper investigates reinforcement learning problems where a stochastic time delay is present in ...
Reinforcement learning scales poorly when reinforcements are delayed. The problem of propagating inf...
In this paper, we present a brief survey of reinforcement learning, with particular emphasis on stoc...
In this paper, we discuss situations arising with reinforcement learning algorithms, when the reinfo...
This paper addresses the problem of learning multidimensional control actions from delayed rewards. ...
We propose two algorithms for Q-learning that use the two-timescale stochastic approximation methodo...
Semi-Markov Decision Problems are continuous time generalizations of discrete time Markov Decision P...
We investigate multi-agent reinforcement learning for stochastic games with complex tasks, where the...
Abstract:- A stochastic automaton can perform a finite number of actions in a random environment. Wh...
We extend the Q-learning algorithm from the Markov Decision Process setting to problems where observ...
International audienceWe consider three-tier network architecture modeled with two physical nodes in...
A class of nonlinear learning algorithms for the Q-and S-model stochastic automaton-random environme...
We present a distributed variant of Q-learning that allows to learn the optimal cost-to-go function ...
We consider the multi-armed restless bandit problem (RMABP) with an infinite horizon average cost ob...
The main contribution of this work is a novel learning algorithm for machine reinforcement learning ...
This paper investigates reinforcement learning problems where a stochastic time delay is present in ...
Reinforcement learning scales poorly when reinforcements are delayed. The problem of propagating inf...
In this paper, we present a brief survey of reinforcement learning, with particular emphasis on stoc...
In this paper, we discuss situations arising with reinforcement learning algorithms, when the reinfo...
This paper addresses the problem of learning multidimensional control actions from delayed rewards. ...
We propose two algorithms for Q-learning that use the two-timescale stochastic approximation methodo...
Semi-Markov Decision Problems are continuous time generalizations of discrete time Markov Decision P...
We investigate multi-agent reinforcement learning for stochastic games with complex tasks, where the...
Abstract:- A stochastic automaton can perform a finite number of actions in a random environment. Wh...
We extend the Q-learning algorithm from the Markov Decision Process setting to problems where observ...
International audienceWe consider three-tier network architecture modeled with two physical nodes in...
A class of nonlinear learning algorithms for the Q-and S-model stochastic automaton-random environme...
We present a distributed variant of Q-learning that allows to learn the optimal cost-to-go function ...
We consider the multi-armed restless bandit problem (RMABP) with an infinite horizon average cost ob...