textabstractA number of experimental studies have investigated whether cooperative behavior may emerge in multi-agent Q-learning. In some studies cooperative behavior did emerge, in others it did not. This report provides a theoretical analysis of this issue. The analysis focuses on multi-agent Q-learning in iterated prisoner’s dilemmas. It is shown that under certain assumptions cooperative behavior may emerge when multi-agent Q-learning is applied in an iterated prisoner’s dilemma. An important consequence of the analysis is that multi-agent Q-learning may result in non-Nash behavior. It is found experimentally that the theoretical results derived in this report are quite robust to violations of the underlying assumptions
ABSTRACT This work considers a stateless Q-learning agent in iterated Prisoner's Dilemma (PD). ...
We present a conceptual framework for creating Qlearning-based algorithms that converge to optimal e...
Qlearning is a recent reinforcement learning RL algorithm that does not need a model of its environ...
A number of experimental studies have investigated whether cooperative behavior may emerge in multi-...
A number of experimental studies have investigated whether cooperative behavior may emerge in multi-...
A number of experimental studies have investigated whether cooperative behavior may emerge in multi-...
A number of experimental studies have investigated whether cooperative behavior may emerge in multi-...
A number of experimental studies have investigated whether cooperative behavior may emerge in multi-...
A number of experimental studies have investigated whether cooperative behavior may emerge in multi-...
A number of experimental studies have investigated whether cooperative behavior may emerge in multi-...
Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate ...
Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate ...
Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate ...
One of the important issues in intelligent systems and robotics is to develop an efficient method to...
One of the important issues in intelligent systems and robotics is to develop an efficient method to...
ABSTRACT This work considers a stateless Q-learning agent in iterated Prisoner's Dilemma (PD). ...
We present a conceptual framework for creating Qlearning-based algorithms that converge to optimal e...
Qlearning is a recent reinforcement learning RL algorithm that does not need a model of its environ...
A number of experimental studies have investigated whether cooperative behavior may emerge in multi-...
A number of experimental studies have investigated whether cooperative behavior may emerge in multi-...
A number of experimental studies have investigated whether cooperative behavior may emerge in multi-...
A number of experimental studies have investigated whether cooperative behavior may emerge in multi-...
A number of experimental studies have investigated whether cooperative behavior may emerge in multi-...
A number of experimental studies have investigated whether cooperative behavior may emerge in multi-...
A number of experimental studies have investigated whether cooperative behavior may emerge in multi-...
Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate ...
Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate ...
Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate ...
One of the important issues in intelligent systems and robotics is to develop an efficient method to...
One of the important issues in intelligent systems and robotics is to develop an efficient method to...
ABSTRACT This work considers a stateless Q-learning agent in iterated Prisoner's Dilemma (PD). ...
We present a conceptual framework for creating Qlearning-based algorithms that converge to optimal e...
Qlearning is a recent reinforcement learning RL algorithm that does not need a model of its environ...