International audienceThe main contribution of this paper consists in extending several non-stationary Reinforcement Learning (RL) algorithms and their theoretical guarantees to the case of discounted zero-sum Markov Games (MGs).As in the case of Markov Decision Processes (MDPs), non-stationary algorithms are shown to exhibit better performance bounds compared to their stationary counterparts. The obtained bounds are generically composed of three terms: 1) a dependency over gamma (discount factor), 2) a concentrability coefficient and 3) a propagation error term. This error, depending on the algorithm, can be caused by a regression step, a policy evaluation step or a best-respon...
A key challenge in multiagent environments is the construction of agents that are able to learn whil...
We consider a learning problem where the decision maker interacts with a standard Markov decision pr...
We examine the use of stationary and Markov strategies in zero-sum stochastic games with finite stat...
Markov games is a framework which can be used to formalise n-agent reinforcement learning (RL). Litt...
Summarization: This paper investigates value function approximation in the context of zero-sum Marko...
We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under dri...
An ideal strategy in zero-sum games should not only grant the player an average reward no less than ...
In this paper, we study the learning problem in two-player general-sum Markov Games. We consider the...
This paper presents a number of successive approximation algorithms for the repeated two-person zero...
It has previously been established that for Markov learning automata games, the game equi-libria are...
We study what dataset assumption permits solving offline two-player zero-sum Markov games. In stark ...
Offline reinforcement learning (RL) aims at learning an optimal strategy using a pre-collected datas...
Dynamic zero-sum games are a model of multiagent decision-making that has been well-studied in the m...
The standard RL world model is that of a Markov Decision Process (MDP). A basic premise of MDPs is t...
We propose a multi-agent reinforcement learning dynamics, and analyze its convergence properties in ...
A key challenge in multiagent environments is the construction of agents that are able to learn whil...
We consider a learning problem where the decision maker interacts with a standard Markov decision pr...
We examine the use of stationary and Markov strategies in zero-sum stochastic games with finite stat...
Markov games is a framework which can be used to formalise n-agent reinforcement learning (RL). Litt...
Summarization: This paper investigates value function approximation in the context of zero-sum Marko...
We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under dri...
An ideal strategy in zero-sum games should not only grant the player an average reward no less than ...
In this paper, we study the learning problem in two-player general-sum Markov Games. We consider the...
This paper presents a number of successive approximation algorithms for the repeated two-person zero...
It has previously been established that for Markov learning automata games, the game equi-libria are...
We study what dataset assumption permits solving offline two-player zero-sum Markov games. In stark ...
Offline reinforcement learning (RL) aims at learning an optimal strategy using a pre-collected datas...
Dynamic zero-sum games are a model of multiagent decision-making that has been well-studied in the m...
The standard RL world model is that of a Markov Decision Process (MDP). A basic premise of MDPs is t...
We propose a multi-agent reinforcement learning dynamics, and analyze its convergence properties in ...
A key challenge in multiagent environments is the construction of agents that are able to learn whil...
We consider a learning problem where the decision maker interacts with a standard Markov decision pr...
We examine the use of stationary and Markov strategies in zero-sum stochastic games with finite stat...