Publications

Sorted by DateClassified by Publication TypeClassified by Research Category

MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning

Elise Van der Pol, Daniel E. Worrall, Herke Van Hoof, Frans A. Oliehoek, and Max Welling. MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning. In Advances in Neural Information Processing Systems 33, December 2020.

Download

pdf [628.1kB]  

Abstract

This paper introduces MDP homomorphic networks for deep reinforcement learning. MDP homomorphic networks are neural networks that are equivariant under symmetries in the joint state-action space of an MDP. Current approaches to deep reinforcement learning do not usually exploit knowledge about such structure. By building this prior knowledge into policy and value networks using an equivariance constraint, we can reduce the size of the solution space. We specifically focus on group-structured symmetries (invertible transformations). Additionally, we introduce an easy method for constructing equivariant network layers numerically, so the system designer need not solve the constraints by hand, as is typically done. We construct MDP homomorphic MLPs and CNNs that are equivariant under either a group of reflections or rotations. We show that such networks converge faster than unstructured baselines on CartPole, a grid world and Pong.

BibTeX Entry

@inproceedings{VanDerPol20NeurIPS,
    author =    {Van der Pol, Elise and Worrall, Daniel E. and Van Hoof, Herke and
                 Oliehoek, Frans A. and Welling, Max},
    title =     {{MDP} Homomorphic Networks: Group Symmetries in Reinforcement Learning},
    booktitle = NIPS33,
    year =      2020,
    month =     dec,
    keywords =   {refereed},
    abstract = {
    This paper introduces MDP homomorphic networks for deep reinforcement learning.
    MDP homomorphic networks are neural networks that are equivariant under
    symmetries in the joint state-action space of an MDP. Current approaches to
    deep reinforcement learning do not usually exploit knowledge about such
    structure. By building this prior knowledge into policy and value networks
    using an equivariance constraint, we can reduce the size of the solution
    space. We specifically focus on group-structured symmetries (invertible
    transformations). Additionally, we introduce an easy method for
    constructing equivariant network layers numerically, so the system designer
    need not solve the constraints by hand, as is typically done. We construct
    MDP homomorphic MLPs and CNNs that are equivariant under either a group of
    reflections or rotations. We show that such networks converge faster than
    unstructured baselines on CartPole, a grid world and Pong.
    }
}

Generated by bib2html.pl (written by Patrick Riley) on Wed Dec 02, 2020 15:44:26 UTC