In many real-world settings, a team of agents must coordinate its behaviour while acting in a decentralised fashion. At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a mixing network that estimates joint action-values as a monotonic combination of per-agent values. We structurally enforce tha...
International Conference on Autonomous Agents and Multiagent Systems (AAMAS '23), 29 May - 2 June 20...
Tackling overestimation in Q-learning is an important problem that has been extensively studied in s...
In multi-agent reinforcement learning, the use of a global objective is a powerful tool for incentiv...
In many real-world settings, a team of agents must coordinate their behaviour while acting in a dece...
QMIX is a popular Q-learning algorithm for cooperative MARL in the centralised training and decentra...
With great success in Reinforcement Learning’s application to a suite of single-agent environments, ...
This work presents a sample efficient and effective value-based method, named SMIX(λ), for reinforce...
VDN and QMIX are two popular value-based algorithms for cooperative MARL that learn a centralized ac...
peer reviewedThis paper introduces four new algorithms that can be used for tackling multi-agent rei...
When individuals interact with one another to accomplish specific goals, they learn from others ...
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative ...
A growing number of real-world control problems require teams of software agents to solve a joint ta...
The StarCraft II Multi-Agent Challenge (SMAC) was created to be a challenging benchmark problem for ...
The exploitation of extra state information has been an active research area in multi-agent reinforc...
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative ...
International Conference on Autonomous Agents and Multiagent Systems (AAMAS '23), 29 May - 2 June 20...
Tackling overestimation in Q-learning is an important problem that has been extensively studied in s...
In multi-agent reinforcement learning, the use of a global objective is a powerful tool for incentiv...
In many real-world settings, a team of agents must coordinate their behaviour while acting in a dece...
QMIX is a popular Q-learning algorithm for cooperative MARL in the centralised training and decentra...
With great success in Reinforcement Learning’s application to a suite of single-agent environments, ...
This work presents a sample efficient and effective value-based method, named SMIX(λ), for reinforce...
VDN and QMIX are two popular value-based algorithms for cooperative MARL that learn a centralized ac...
peer reviewedThis paper introduces four new algorithms that can be used for tackling multi-agent rei...
When individuals interact with one another to accomplish specific goals, they learn from others ...
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative ...
A growing number of real-world control problems require teams of software agents to solve a joint ta...
The StarCraft II Multi-Agent Challenge (SMAC) was created to be a challenging benchmark problem for ...
The exploitation of extra state information has been an active research area in multi-agent reinforc...
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative ...
International Conference on Autonomous Agents and Multiagent Systems (AAMAS '23), 29 May - 2 June 20...
Tackling overestimation in Q-learning is an important problem that has been extensively studied in s...
In multi-agent reinforcement learning, the use of a global objective is a powerful tool for incentiv...