When an agent learns in a multi-agent environment, the payoff it receives is dependent on the behaviour of the other agents. If the other agents are also learning, its reward distribution becomes non-stationary. This makes learning in multi-agent systems more difficult than single-agent learning. Prior attempts at value-function based learning in such domains have used off-policy Q-learning that do not scale well as the cornerstone, with restricted success. This paper studies on-policy modifications of such algorithms, with the promise of scalability and efficiency. In particular, it is proven that these hybrid techniques are guaranteed to converge to their desired fixed points under some restrictions. It is also shown, experimentally, that...
Reinforcement learning is a promising technique for learning agents to adapt their own strategies in...
This paper investigates the problem of policy learning in multiagent environments using the stochast...
Two multi-agent policy iteration learning algorithms are proposed in this work. The two proposed alg...
This paper investigates the use of experience generaliza-tion on concurrent and on-line policy learn...
This article investigates the performance of independent reinforcement learners in multi-agent games...
This work presents a fully distributed algorithm for learning the optimal policy in a multi-agent co...
Due to the non-stationary environment, learning in multi-agent systems is a challenging problem. Thi...
In this paper we compare state-of-the-art multi-agent reinforcement learning algorithms in a wide va...
On-line learning methods have been applied successfully in multi-agent systems to achieve coordinati...
Most research in reinforcement learning has focused on stationary environments. In this paper, we pr...
Most research in reinforcement learning has focused on stationary environments. In this paper, we pr...
With great success in Reinforcement Learning’s application to a suite of single-agent environments, ...
Several multiagent reinforcement learning (MARL) algorithms have been proposed to optimize agents ’ ...
Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate ...
Reinforcement learning is the problem faced by an agent that must learn behaviour through trial-and-...
Reinforcement learning is a promising technique for learning agents to adapt their own strategies in...
This paper investigates the problem of policy learning in multiagent environments using the stochast...
Two multi-agent policy iteration learning algorithms are proposed in this work. The two proposed alg...
This paper investigates the use of experience generaliza-tion on concurrent and on-line policy learn...
This article investigates the performance of independent reinforcement learners in multi-agent games...
This work presents a fully distributed algorithm for learning the optimal policy in a multi-agent co...
Due to the non-stationary environment, learning in multi-agent systems is a challenging problem. Thi...
In this paper we compare state-of-the-art multi-agent reinforcement learning algorithms in a wide va...
On-line learning methods have been applied successfully in multi-agent systems to achieve coordinati...
Most research in reinforcement learning has focused on stationary environments. In this paper, we pr...
Most research in reinforcement learning has focused on stationary environments. In this paper, we pr...
With great success in Reinforcement Learning’s application to a suite of single-agent environments, ...
Several multiagent reinforcement learning (MARL) algorithms have been proposed to optimize agents ’ ...
Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate ...
Reinforcement learning is the problem faced by an agent that must learn behaviour through trial-and-...
Reinforcement learning is a promising technique for learning agents to adapt their own strategies in...
This paper investigates the problem of policy learning in multiagent environments using the stochast...
Two multi-agent policy iteration learning algorithms are proposed in this work. The two proposed alg...