The policy gradient method is a popular technique for implementing reinforcement learning in an agent system. One of the reasons is that a policy gradient learner has a simple design and strong theoretical properties in single-agent domains. Previously, Williams showed that the REINFORCE algorithm is a special case of policy gradient learning. He also showed that a learning automaton could be seen as a special case of the REINFORCE algorithm. Learning automata theory guarantees that a group of automata will converge to a stable equilibrium in team games. In this paper we will show a theoretical connection between learning automata and policy gradient methods to transfer this theoretical result to multi-agent policy gradient learning. An app...