This paper tackles the problem of active perception: taking actions to minimize one’s uncertainty. It further formalizes the link between information gain and prediction rewards, and uses this to propose a deep-learning approach to optimize active perception from a data set, thus obviating the need for a complex POMDP model.
Aleksander Czechowski got his paper on Decentralized MCTS via Learned Teammate Models accepted at IJCAI 2020.
In this paper we learn the models of other agents that each agent then uses to predict the future with. Stay tuned for the camready.
Together with Thomas Kipf, Max Welling and myself, Elise van der Pol did some excellent work on model-based RL.
- This post on plannable approximations with MDP homomorphisms
- The paper
I will be co-organizing a AAAI spring symposium on “Challenges and Opportunities for Multi-Agent Reinforcement Learning”. We want to make it a workshop with some actual ‘work’. Please read here for more info.
Please find the new website here:
Identify the main challenges and opportunities in multiagent reinforcement learning (MARL).
In particular, we aim to organize an active workshop with many interactive (breakout) sessions to investigate fundamental issues that hinder the applicability of MARL for solving complex real world problems.
Submission: November 1st, 2019
Notification: December 6th, 2019
Symposium: March 23-25 2020
We live in a multi-agent world and to be successful in that world, intelligent agents, will need to learn to take into account the agency of others. They will need to compete in market places, cooperate in teams, communicate with others, coordinate their plans, and negotiate outcomes. Examples include self-driving cars interacting in traffic, personal assistants acting on behalf of humans and negotiating with other agents, swarms of unmanned aerial vehicles, financial trading systems, robotic teams, and household robots.
There has been a lot of great work on multi-agent reinforcement learning (MARL) in the past decade, but significant challenges remain, including:
We will also solicit short descriptions, from accepted authors and
other participants, of the topic or challenge they would like to work on
during the symposium.”
- the difficulty of learning an optimal model/policy from a partial signal,
- learning to cooperate/compete in non-stationary environments with distributed, simultaneously learning agents,
- the interplay between abstraction and influence of other agents,
- the exploration vs. exploitation dilemma,
- the scalability and effectiveness of learning algorithms,
- avoiding social dilemmas, and
- learning emergent communication.
The purpose of this symposium is to bring together researchers in multiagent reinforcement learning, but also more widely machine learning and multiagent systems, to explore some of these and other challenges in more detail. The main goal is to broaden the scope of MARL research and to address the fundamental issues that hinder the applicability of MARL for solving complex real world problems.
We aim to organize an active workshop, with many interactive (brainstorm/breakout) sessions. We are hopeful that this will form the basis for ongoing collaborations on these challenges between the attendants and we aim for several position papers as concrete outcomes.
Call for Papers
Authors can submit papers of 1-4 pages that will be reviewed by the organizing committee. We are looking for position papers that present a challenge or opportunity for MARL research, which should be on a topic the authors not only wish to interact on but also ‘work’ on with other participants during the symposium. We also welcome (preliminary) research papers that describe new perspectives to dealing with MARL challenges, but we are not looking for summaries of current research—papers should clearly state some limitation(s) of current methods and potential ways these could be overcome. Submissions will be handled through easychair.
In the lead up to the workshop, we will contact both authors with accepted papers and others that indicated they will participate, asking them for a short description of the topic or challenge they would like to work on during the symposium. We will try to distill these into a number of core questions to work on. The ultimate goal of the symposium is to result in a number of joint position papers between participants that can grow into conference submissions within a year from the symposium.
College of Computer and Information Science, Northeastern University
Dept. of Intelligent Systems, Delft University of Technology
Google DeepMind, Paris
Google DeepMind, Paris
This is the question that De Volkskrant asked me to comment on. Find the piece here (in Dutch).
Reinforcement learning is tough. POMDPs are hard. And doing RL in partially observable problems is a huge challenge. With Sammie and Chris Amato, I have been making some progress to get a principled method (based on Monte Carlo tree search) too scale for structured problems. We can learn both how to act, as well as the structure of the problem at the same time. See the paper and bib.
On this page, we show some videos of our experimental results in two different environments, Myopic Breakout and Traffic Control.
The InfluenceNet model (PPO-InfluenceNet) is able to learn the “tunnel” strategy, where it creates an opening on the left (or right) side and plays the ball in there to score a lot of points:
The feedforward network with no internal memory performs considerably worse than the InfluenceNet model:
The Traffic Control task was modified as follows:
- The size of the observable region was slightly reduced and the delay between the moment an action is taken and the time the lights switch was increased to 6 seconds. During these 6 seconds the green light turns yellow.
- The speed penalty was removed and there is only a negative reward of -0.1 for every car that is stopped at a traffic light.
As shown in the video below, a memoryless agent can only switch the lights when a car enters the local region. With the new changes, this means that the light turns green too late and the cars have to stop:
On the other hand, the InfluenceNet agent is able to anticipate that a car will be entering the local region and thus switch the lights just in time for the cars to continue without stopping:
Can deep Q-networks etc. brute force their way through tough coordination problems…? Perhaps not. Jacopo’s work, accepted as an extended abstract at AAMAS’19, takes a first step in exploring this in the one-shot setting.
Not so surprising: “joint Q-learner” can be too large/slow and “individual Q-learners” can fail to find good representations.
But good to know: “factored Q-value functions” which represent the Q-function as a random mixture of components involving 2 or 3 agents, can do quite well, even for hard coordination tasks!