This paper proposes novel, end-to-end deep reinforcement learning algorithms for learning two-player zero-sum Markov games. Different from prior efforts on training agents to beat a fixed set of opponents, our objective is to find the Nash equilibrium policies that are free from exploitation by even the adversarial opponents. We propose (1) Nash DQN algorithm, which integrates DQN with a Nash finding subroutine for the joint value functions; and (2) Nash DQN Exploiter algorithm, which additionally adopts an exploiter for guiding agent's exploration. Our algorithms are the practical variants of theoretical algorithms which are guaranteed to converge to Nash equilibria in the basic tabular setting. Experimental evaluation on both tabular exam...
We develop a computational method to identify all pure strategy equilibrium points in the strategy s...
General Game Playing agents are required to play games they have never seen before simply by looking...
28 pagesConsider a 2-player normal-form game repeated over time. We introduce an adaptive learning p...
Model-free learning for multi-agent stochastic games is an active area of research. Existing reinfor...
Many real-world applications can be described as large-scale games of imperfect information. To deal...
When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often c...
In this thesis, we explore the use of policy approximation for reducing the computational cost of le...
In this paper we propose a novel Deep Reinforcement Learning (DRL) algorithm that uses the concept o...
Algorithms designed for single-agent reinforcement learning (RL) generally fail to converge to equil...
Reinforcement learning has shown much success in games such as chess, backgammon and Go [21,24,22]....
Since DeepMind pioneered a deep reinforcement learning (DRL) model to play the Atari games, DRL has ...
Dynamic zero-sum games are a model of multiagent decision-making that has been well-studied in the m...
In the past decade, learning algorithms developed to play video games better than humans have become...
Several multiagent reinforcement learning (MARL) algorithms have been proposed to optimize agents ’ ...
Markov games is a framework which can be used to formalise n-agent reinforcement learning (RL). Litt...
We develop a computational method to identify all pure strategy equilibrium points in the strategy s...
General Game Playing agents are required to play games they have never seen before simply by looking...
28 pagesConsider a 2-player normal-form game repeated over time. We introduce an adaptive learning p...
Model-free learning for multi-agent stochastic games is an active area of research. Existing reinfor...
Many real-world applications can be described as large-scale games of imperfect information. To deal...
When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often c...
In this thesis, we explore the use of policy approximation for reducing the computational cost of le...
In this paper we propose a novel Deep Reinforcement Learning (DRL) algorithm that uses the concept o...
Algorithms designed for single-agent reinforcement learning (RL) generally fail to converge to equil...
Reinforcement learning has shown much success in games such as chess, backgammon and Go [21,24,22]....
Since DeepMind pioneered a deep reinforcement learning (DRL) model to play the Atari games, DRL has ...
Dynamic zero-sum games are a model of multiagent decision-making that has been well-studied in the m...
In the past decade, learning algorithms developed to play video games better than humans have become...
Several multiagent reinforcement learning (MARL) algorithms have been proposed to optimize agents ’ ...
Markov games is a framework which can be used to formalise n-agent reinforcement learning (RL). Litt...
We develop a computational method to identify all pure strategy equilibrium points in the strategy s...
General Game Playing agents are required to play games they have never seen before simply by looking...
28 pagesConsider a 2-player normal-form game repeated over time. We introduce an adaptive learning p...