Challenges and Opportunities for Multi-Agent Reinforcement Learning – AAAI Spring Symposium 2020




Please find the new website here:

https://sites.google.com/view/comarl-aaai-2020/




Goal

Identify the main challenges and opportunities in multiagent reinforcement learning (MARL).

In particular, we aim to organize an active workshop with many interactive (breakout) sessions to investigate fundamental issues that hinder the applicability of MARL for solving complex real world problems.

Key Dates

Submission: November 1st, 2019
Notification: December 6th, 2019
Symposium: March 23-25 2020

Detailed Description

We live in a multi-agent world and to be successful in that world, intelligent agents, will need to learn to take into account the agency of others. They will need to compete in market places, cooperate in teams, communicate with others, coordinate their plans, and negotiate outcomes. Examples include self-driving cars interacting in traffic, personal assistants acting on behalf of humans and negotiating with other agents, swarms of unmanned aerial vehicles, financial trading systems, robotic teams, and household robots.

There has been a lot of great work on multi-agent reinforcement learning (MARL) in the past decade, but significant challenges remain, including:

We will also solicit short descriptions, from accepted authors and
other participants, of the topic or challenge they would like to work on
during the symposium.”

  • the difficulty of learning an optimal model/policy from a partial signal,
  • learning to cooperate/compete in non-stationary environments with distributed, simultaneously learning agents,
  • the interplay between abstraction and influence of other agents,
  • the exploration vs. exploitation dilemma,
  • the scalability and effectiveness of learning algorithms,
  • avoiding social dilemmas, and
  • learning emergent communication. 

The purpose of this symposium is to bring together researchers in multiagent reinforcement learning, but also more widely machine learning and multiagent systems, to explore some of these and other challenges in more detail. The main goal is to broaden the scope of MARL research and to address the fundamental issues that hinder the applicability of MARL for solving complex real world problems.

We aim to organize an active workshop, with many interactive (brainstorm/breakout) sessions. We are hopeful that this will form the basis for ongoing collaborations on these challenges between the attendants and we aim for several position papers as concrete outcomes.

Call for Papers

Authors can submit papers of 1-4 pages that will be reviewed by the organizing committee. We are looking for position papers that present a challenge or opportunity for MARL research, which should be on a topic the authors not only wish to interact on but also ‘work’ on with other participants during the symposium. We also welcome (preliminary) research papers that describe new perspectives to dealing with MARL challenges, but we are not looking for summaries of current research—papers should clearly state some limitation(s) of current methods and potential ways these could be overcome. Submissions will be handled through easychair.

Challenge Descriptions

In the lead up to the workshop, we will contact both authors with accepted papers and others that indicated they will participate, asking them for a short description of the topic or challenge they would like to work on during the symposium. We will try to distill these into a number of core questions to work on. The ultimate goal of the symposium is to result in a number of joint position papers between participants that can grow into conference submissions within a year from the symposium.

Organizing Committee

Christopher Amato
College of Computer and Information Science, Northeastern University
camato@ccs.neu.edu

Frans Oliehoek
Dept. of Intelligent Systems, Delft University of Technology
f.a.oliehoek@tudelft.nl

Shayegan Omidshafiei
Google DeepMind, Paris
somidshafiei@google.com

Karl Tuyls
Google DeepMind, Paris
karltuyls@google.com

Scaling Bayesian RL for Factored POMDPs

Reinforcement learning is tough. POMDPs are hard. And doing RL in partially observable problems is a huge challenge. With Sammie and Chris Amato, I have been making some progress to get a principled method (based on Monte Carlo tree search) too scale for structured problems. We can learn both how to act, as well as the structure of the problem at the same time. See the paper and bib.

Video Demo with ALA submission ‘Influence-Based Abstraction in Deep Reinforcement Learning’

On this page, we show some videos of our experimental results in two different environments, Myopic Breakout and Traffic Control.

Myopic Breakout

The InfluenceNet model (PPO-InfluenceNet) is able to learn the “tunnel” strategy, where it creates an opening on the left (or right) side and plays the ball in there to score a lot of points:

 

 

The feedforward network with no internal memory performs considerably worse than the InfluenceNet model:

 

 

Traffic Control

The Traffic Control task was modified as follows:

  • The size of the observable region was slightly reduced and the delay between the moment an action is taken and the time the lights switch was increased to 6 seconds. During these 6 seconds the green light turns yellow.
  • The speed penalty was removed and there is only a negative reward of -0.1 for every car that is stopped at a traffic light.

As shown in the video below, a memoryless agent can only switch the lights when a car enters the local region. With the new changes, this means that the light turns green too late and the cars have to stop:

 

 

On the other hand, the InfluenceNet agent is able to anticipate that a car will be entering the local region and thus switch the lights just in time for the cars to continue without stopping:

 

At AAMAS: Deep learning of Coordination…?

Can deep Q-networks etc. brute force their way through tough coordination problems…? Perhaps not. Jacopo’s work, accepted as an extended abstract at AAMAS’19, takes a first step in exploring this in the one-shot setting.

Not so surprising: “joint Q-learner” can be too large/slow and “individual Q-learners” can fail to find good representations.

But good to know: “factored Q-value functions” which represent the Q-function as a random mixture of components involving 2 or 3 agents, can do quite well, even for hard coordination tasks!