Publications• Sorted by Date • Classified by Publication Type • Classified by Research Category • PEBL: Pessimistic Ensembles for Offline Deep Reinforcement LearningJordi Smit, Canmanie Ponnambalam, Matthijs T.J. Spaan, and Frans A. Oliehoek. PEBL: Pessimistic Ensembles for Offline Deep Reinforcement Learning. In IJCAI Workshop on Robust and Reliable Autonomy in the Wild (R2AW), August 2021. DownloadAbstractOffline reinforcement learning (RL), or learning from a fixed data set, is an attractive alternative to online RL. Offline RL promises to address the cost and safety implications of taking numerous random or bad actions online, a crucial aspect of traditional RL that makes it difficult to apply in realworld problems. However, when RL is na¨ıvely applied to a fixed data set, the resulting policy may exhibit poor performance in the real environment. This happens due to overestimation of the value of state-action pairs not sufficiently covered by the data set. A promising way to avoid this is by applying pessimism and acting according to a lower bound estimate on the value. In deep reinforcement learning, however, uncertainty estimation is highly non-trivial and development of effective uncertainty-based pessimistic algorithms remains an open question. This paper introduces two novel offline deep RL methods built on Double Deep Q-Learning and Soft Actor-Critic. We show how a multi-headed bootstrap approach to uncertainty estimation is used to calculate an effective pessimistic value penalty. Our approach is applied to benchmark offline deep RL domains, where we demonstrate that our methods can often beat the current state-of-the-art. BibTeX Entry@inproceedings{Smit21IJCAIWS, title = {{PEBL}: Pessimistic Ensembles for Offline Deep Reinforcement Learning}, author = {Jordi Smit and Canmanie Ponnambalam and Matthijs T.J. Spaan and Frans A. Oliehoek}, booktitle = {IJCAI Workshop on Robust and Reliable Autonomy in the Wild (R2AW)}, year = 2021, month = aug, keywords = {refereed}, abstract = {Offline reinforcement learning (RL), or learning from a fixed data set, is an attractive alternative to online RL. Offline RL promises to address the cost and safety implications of taking numerous random or bad actions online, a crucial aspect of traditional RL that makes it difficult to apply in realworld problems. However, when RL is na¨ıvely applied to a fixed data set, the resulting policy may exhibit poor performance in the real environment. This happens due to overestimation of the value of state-action pairs not sufficiently covered by the data set. A promising way to avoid this is by applying pessimism and acting according to a lower bound estimate on the value. In deep reinforcement learning, however, uncertainty estimation is highly non-trivial and development of effective uncertainty-based pessimistic algorithms remains an open question. This paper introduces two novel offline deep RL methods built on Double Deep Q-Learning and Soft Actor-Critic. We show how a multi-headed bootstrap approach to uncertainty estimation is used to calculate an effective pessimistic value penalty. Our approach is applied to benchmark offline deep RL domains, where we demonstrate that our methods can often beat the current state-of-the-art.} }
Generated by
bib2html.pl
(written by Patrick Riley) on
Mon Oct 07, 2024 14:17:04 UTC |