On this page, we show some videos of our experimental results in two different environments, Myopic Breakout and Traffic Control.
The InfluenceNet model (PPO-InfluenceNet) is able to learn the “tunnel” strategy, where it creates an opening on the left (or right) side and plays the ball in there to score a lot of points:
The feedforward network with no internal memory performs considerably worse than the InfluenceNet model:
Side by side comparison of a model with only the local model (left, PPO+FNN) and InfluenceNet model (right, PPO+InfluenceNet) in the setting with 10 cars and a yellow time of 4:
The following two videos show behaviors with settings differing from those in the paper, part of ongoing experimentation.
The behavior of the local model in the setting with 15 cars and a yellow time of 8:
The behavior of the InfluenceNet model in the setting with 15 cars and a yellow time of 8:
New results Traffic Control
The Traffic Control task was modified as follows:
- The size of the observable region was slightly reduced and the delay between the moment an action is taken and the time the lights switch was increased to 6 seconds. During these 6 seconds the green light turns yellow.
- The speed penalty was removed and there is only a negative reward of -0.1 for every car that is stopped at a traffic light.
As shown in the video below, a memoryless agent can only switch the lights when a car enters the local region. With the new changes, this means that the light turns green too late and the cars have to stop:
On the other hand, the InfluenceNet agent is able to anticipate that a car will be entering the local region and thus switch the lights just in time for the cars to continue without stopping: