We study episodic two-player zero-sum Markov games (MGs) in the offline setting, where the goal is to find an approximate Nash equilibrium (NE) policy pair based on a dataset collected a priori. When the dataset does not have uniform coverage over all policy pairs, finding an approximate NE involves challenges in three aspects: (i) distributional shift between the behavior policy and the optimal policy, (ii) function approximation to handle large state space, and (iii) minimax optimization for equilibrium solving. We propose a pessimism-based algorithm, dubbed as pessimistic minimax value iteration (PMVI), which overcomes the distributional shift by constructing pessimistic estimates of the value functions for both players and outputs a pol...
In many real-world problems, there is a dynamic interaction between competitive agents. Partially ob...
We address the online linear optimization problem when the actions of the forecaster are represented...
Summarization: This paper investigates value function approximation in the context of zero-sum Marko...
We study what dataset assumption permits solving offline two-player zero-sum Markov games. In stark ...
Offline reinforcement learning (RL) aims at learning an optimal strategy using a pre-collected datas...
We consider a multi-agent noncooperative game with agents’ objective functions being affected by unc...
We provide several applications of Optimistic Mirror Descent, an online learning algorithm based on ...
Motivated by a machine learning perspective|that gametheoretic equilibria constraints should serve a...
We study offline reinforcement learning (RL) in partially observable Markov decision processes. In p...
International audienceThis paper addresses the problem of learning a Nash equilibrium in γ-discounte...
We study decentralized policy learning in Markov games where we control a single agent to play with ...
We study the problem of finding the Nash equilibrium in a two-player zero-sum Markov game. Due to it...
This work regards Nash equilibrium-seeking in multi-player finite games. We present a discrete-time ...
We propose a multi-agent reinforcement learning dynamics, and analyze its convergence properties in ...
We motivate and propose a new model for non-cooperative Markov game which considers the interactions...
In many real-world problems, there is a dynamic interaction between competitive agents. Partially ob...
We address the online linear optimization problem when the actions of the forecaster are represented...
Summarization: This paper investigates value function approximation in the context of zero-sum Marko...
We study what dataset assumption permits solving offline two-player zero-sum Markov games. In stark ...
Offline reinforcement learning (RL) aims at learning an optimal strategy using a pre-collected datas...
We consider a multi-agent noncooperative game with agents’ objective functions being affected by unc...
We provide several applications of Optimistic Mirror Descent, an online learning algorithm based on ...
Motivated by a machine learning perspective|that gametheoretic equilibria constraints should serve a...
We study offline reinforcement learning (RL) in partially observable Markov decision processes. In p...
International audienceThis paper addresses the problem of learning a Nash equilibrium in γ-discounte...
We study decentralized policy learning in Markov games where we control a single agent to play with ...
We study the problem of finding the Nash equilibrium in a two-player zero-sum Markov game. Due to it...
This work regards Nash equilibrium-seeking in multi-player finite games. We present a discrete-time ...
We propose a multi-agent reinforcement learning dynamics, and analyze its convergence properties in ...
We motivate and propose a new model for non-cooperative Markov game which considers the interactions...
In many real-world problems, there is a dynamic interaction between competitive agents. Partially ob...
We address the online linear optimization problem when the actions of the forecaster are represented...
Summarization: This paper investigates value function approximation in the context of zero-sum Marko...