Publications

• Sorted by Date • Classified by Publication Type • Classified by Research Category •

Uncoupled Learning of Differential Stackelberg Equilibria with Commitments

Robert Loftin, Mustafa Mert Çelikok, Herke van Hoof, Samuel Kaski, and Frans A Oliehoek. Uncoupled Learning of Differential Stackelberg Equilibria with Commitments. In Proceedings of the Twenty-Third International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2024.

Download

pdf [585.1kB]

Abstract

In multi-agent problems requiring a high degree of cooperation, success often depends on the ability of the agents to adapt to each other’s behavior. A natural solution concept in such settings is the Stackelberg equilibrium, in which the “leader” agent selects the strategy that maximizes its own payoff given that the “follower” agent will choose their best response to this strategy. Recent work has extended this solution concept to two-player differentiable games, such as those arising from multi-agent deep reinforcement learning, in the form of the differential Stackelberg equilibrium. While this previous work has presented learning dynamics which converge to such equilibria, these dynamics are “coupled” in the sense that the learning updates for the leader’s strategy require some information about the follower’s payoff function. As such, these methods cannot be applied to truly decentralised multi-agent settings, particularly ad hoc cooperation, where each agent only has access to its own payoff function. In this work we present “uncoupled” learning dynamics based on zeroth-order gradient es- timators, in which each agent’s strategy update depends only on their observations of the other’s behavior. We analyze the conver- gence of these dynamics in general-sum games, and prove that they converge to differential Stackelberg equilibria under the same conditions as previous coupled methods. Furthermore, we present an online mechanism by which symmetric learners can negotiate leader-follower roles. We conclude with a discussion of the impli- cations of our work for multi-agent reinforcement learning and ad hoc collaboration more generally.

BibTeX Entry

@inproceedings{Loftin24AAMAS,
    title=      {Uncoupled Learning of Differential Stackelberg Equilibria with Commitments},
    author=     {Loftin, Robert and {\c C}elikok, Mustafa Mert and van Hoof, Herke 
                 and Kaski, Samuel and Oliehoek, Frans A},
    booktitle = AAMAS24,
    location=   {Auckland, New Zealand},
    organization={IFAAMAS},
    year =      2024,
    month =     may,
    keywords =   {refereed},
    abstract =  {
        In multi-agent problems requiring a high degree of cooperation,
        success often depends on the ability of the agents to adapt to each
        other’s behavior. A natural solution concept in such settings is the
        Stackelberg equilibrium, in which the “leader” agent selects the
        strategy that maximizes its own payoff given that the “follower”
        agent will choose their best response to this strategy. Recent work
        has extended this solution concept to two-player differentiable
        games, such as those arising from multi-agent deep reinforcement
        learning, in the form of the differential Stackelberg equilibrium.
        While this previous work has presented learning dynamics which
        converge to such equilibria, these dynamics are “coupled” in the
        sense that the learning updates for the leader’s strategy require
        some information about the follower’s payoff function. As such,
        these methods cannot be applied to truly decentralised multi-agent
        settings, particularly ad hoc cooperation, where each agent only
        has access to its own payoff function. In this work we present
        “uncoupled” learning dynamics based on zeroth-order gradient es-
        timators, in which each agent’s strategy update depends only on
        their observations of the other’s behavior. We analyze the conver-
        gence of these dynamics in general-sum games, and prove that
        they converge to differential Stackelberg equilibria under the same
        conditions as previous coupled methods. Furthermore, we present
        an online mechanism by which symmetric learners can negotiate
        leader-follower roles. We conclude with a discussion of the impli-
        cations of our work for multi-agent reinforcement learning and ad
        hoc collaboration more generally.
    }
}

Generated by bib2html.pl (written by Patrick Riley) on Tue Jun 25, 2024 12:39:45 UTC