Publications

• Sorted by Date • Classified by Publication Type • Classified by Research Category •

Robust Ensemble Adversarial Model-Based Reinforcement Learning

Daniele Foffano, Jinke He, and Frans A. Oliehoek. Robust Ensemble Adversarial Model-Based Reinforcement Learning. In Proceedings of the AAMAS Workshop on Adaptive Learning Agents (ALA), May 2022.

Download

pdf [624.8kB]

Abstract

Model-Based Reinforcement Learning (MBRL) algorithms solve sequential decision-making problems, usually formalized as Markov Decision Processes, using a model of the environment dynamics to compute the optimal policy. When dealing with complex environments, the environment dynamics are frequently approximated with function approximators (such as Neural Networks) that are not guaranteed to converge to an optimal solution. As a consequence, the planning process using samples generated by an imperfect model is also not guaranteed to converge to the optimal policy. In fact, the mismatch between source and target dynamics distribution can result in compounding errors, leading to poor algorithm performance during testing. To mitigate this, we combine the Robust Markov Decision Processes (RMDPs) framework and an ensemble of models to take into account the uncertainty in the approximation of the dynamics. With RMDPs, we can study the uncertainty problem as a two-player stochastic game where Player 1 aims to maximize the expected return and Player 2 wants to minimize it. Using an ensemble of models, Player 2 can choose the worst model to carry out the transitions when performing rollout for the policy improvement. We present Robust Ensemble AdversariaL (REAL) MBRL, an ensemble-based algorithm leveraging the use of an Adversarial agent to compute a policy more robust to model errors. We propose two adversarial approaches: one with a greedy adversary and one with an epsilon-random adversary. We experimentally show that finding a maximin strategy for this two-player game results in a policy robust to model errors leading to better performance when compared to assuming the learned dynamics to be correct.

BibTeX Entry

@inproceedings{Foffano22ALA,
    author =    {Foffano, Daniele and
                 He, Jinke and
                 Oliehoek, Frans A.},
    title =     {Robust Ensemble Adversarial Model-Based Reinforcement Learning},
    booktitle = ALA22,
    year =      2022,
    month =     may,
    keywords =   {refereed},
    abstract = {
    Model-Based Reinforcement Learning (MBRL) algorithms solve sequential decision-making problems, usually formalized as Markov Decision Processes, using a model of the environment dynamics to compute the optimal policy. When dealing with complex environments, the environment dynamics are frequently approximated with function approximators (such as Neural Networks) that are not guaranteed to converge to an optimal solution. As a consequence, the planning process using samples generated by an imperfect model is also not guaranteed to converge to the optimal policy. In fact, the mismatch between source and target dynamics distribution can result in compounding errors, leading to poor algorithm performance during testing. To mitigate this, we combine the Robust Markov Decision Processes (RMDPs) framework and an ensemble of models to take into account the uncertainty in the approximation of the dynamics. With RMDPs, we can study the uncertainty problem as a two-player stochastic game where Player 1 aims to maximize the expected return and Player 2 wants to minimize it. Using an ensemble of models, Player 2 can choose the worst model to carry out the transitions when performing rollout for the policy improvement. We present Robust Ensemble AdversariaL (REAL) MBRL, an ensemble-based algorithm leveraging the use of an Adversarial agent to compute a policy more robust to model errors. We propose two adversarial approaches: one with a greedy adversary and one with an epsilon-random adversary. We experimentally show that finding a maximin strategy for this two-player game results in a policy robust to model errors leading to better performance when compared to assuming the learned dynamics to be correct.     
    }
}

Generated by bib2html.pl (written by Patrick Riley) on Tue Jun 25, 2024 12:39:45 UTC