We have some first results on using influence-based abstractions in the context of deep reinforcement learning, which will be presented at the ALA workshop in Montreal.
Reinforcement learning is tough. POMDPs are hard. And doing RL in partially observable problems is a huge challenge. With Sammie and Chris Amato, I have been making some progress to get a principled method (based on Monte Carlo tree search) too scale for structured problems. We can learn both how to act, as well as the structure of the problem at the same time. See the paper and bib.
On this page, we show some videos of our experimental results in two different environments, Myopic Breakout and Traffic Control.
The InfluenceNet model (PPO-InfluenceNet) is able to learn the “tunnel” strategy, where it creates an opening on the left (or right) side and plays the ball in there to score a lot of points:
The feedforward network with no internal memory performs considerably worse than the InfluenceNet model:
The Traffic Control task was modified as follows:
As shown in the video below, a memoryless agent can only switch the lights when a car enters the local region. With the new changes, this means that the light turns green too late and the cars have to stop:
On the other hand, the InfluenceNet agent is able to anticipate that a car will be entering the local region and thus switch the lights just in time for the cars to continue without stopping:
Can deep Q-networks etc. brute force their way through tough coordination problems…? Perhaps not. Jacopo’s work, accepted as an extended abstract at AAMAS’19, takes a first step in exploring this in the one-shot setting.
Not so surprising: “joint Q-learner” can be too large/slow and “individual Q-learners” can fail to find good representations.
But good to know: “factored Q-value functions” which represent the Q-function as a random mixture of components involving 2 or 3 agents, can do quite well, even for hard coordination tasks!
My invited IJCAI paper giving an overview of (some of, apologies to some coauthors, could not fit everything there….) my research is now available from my publications page.