Publications

• Sorted by Date • Classified by Publication Type • Classified by Research Category •

CoACT: Coordination via Aligned Centralized Training in Multi-Agent Reinforcement Learning

Oussama Azizi, Frans A. Oliehoek, and Matthijs T.J. Spaan. CoACT: Coordination via Aligned Centralized Training in Multi-Agent Reinforcement Learning. In AAMAS Workshop on Causal Learning and Reasoning in Agents and Multi-Agent Systems (CLaRAMAS), May 2026.

Download

pdf [775.0kB]

Abstract

Cooperative Multi-Agent Reinforcement Learning (MARL)has achieved remarkable success in complex tasks, with Centralized Train-ing Decentralized Execution (CTDE) being the dominant paradigm fortraining cooperative agents. However, most CTDE methods do not ex-ploit the fact that agents are heterogeneous, each receiving observationsgenerated by a different subset of state factors. In this work, we showthat this heterogeneity can be exploited for a more efficient learning:by treating agents observation as partial views of the same underlyingfactors, and leveraging this structural dependency, we can can align theagents’ representations during training to improve sample efficiency ofexisting state-of-the-art CTDE algorithms.

BibTeX Entry

@inproceedings{Azizi26CLARAMAS,
    title =     {CoACT: Coordination via Aligned Centralized Training in Multi-Agent 
                 Reinforcement Learning},
    author =    {Oussama Azizi and Frans A. Oliehoek and Matthijs T.J. Spaan},
    booktitle = {AAMAS Workshop on Causal Learning and Reasoning in Agents and 
                 Multi-Agent Systems (CLaRAMAS)},
    year =      2026,
    month =     may,
    url =       {https://openreview.net/forum?id=Oc0QipuIV0},
    keywords =  {refereed, workshop},
    abstract={
Cooperative Multi-Agent Reinforcement Learning (MARL)
has achieved remarkable success in complex tasks, with Centralized Train-
ing Decentralized Execution (CTDE) being the dominant paradigm for
training cooperative agents. However, most CTDE methods do not ex-
ploit the fact that agents are heterogeneous, each receiving observations
generated by a different subset of state factors. In this work, we show
that this heterogeneity can be exploited for a more efficient learning:
by treating agents observation as partial views of the same underlying
factors, and leveraging this structural dependency, we can can align the
agents’ representations during training to improve sample efficiency of
existing state-of-the-art CTDE algorithms.
    }
}

Generated by bib2html.pl (written by Patrick Riley) on Wed Jun 17, 2026 13:52:25 UTC