Publications

• Sorted by Date • Classified by Publication Type • Classified by Research Category •

Learning in POMDPs with Monte Carlo Tree Search

Sammie Katt, Frans A. Oliehoek, and Christopher Amato. Learning in POMDPs with Monte Carlo Tree Search. ArXiv e-prints, June 2018.
Version of our ICML paper including all proofs, also available from arXiv.

Download

pdf [596.8kB]

Abstract

The POMDP is a powerful framework for reason-ing under outcome and information uncertainty,but constructing an accurate POMDP model isdifficult. Bayes-Adaptive Partially ObservableMarkov Decision Processes (BA-POMDPs) ex-tend POMDPs to allow the model to be learnedduring execution. BA-POMDPs are a BayesianRL approach that, in principle, allows for anoptimal trade-off between exploitation and ex-ploration. Unfortunately, BA-POMDPs are cur-rently impractical to solve for any non-trivial do-main. In this paper, we extend the Monte-CarloTree Search method POMCP to BA-POMDPsand show that the resulting method, which wecall BA-POMCP, is able to tackle problems thatprevious solution methods have been unable tosolve. Additionally, we introduce several tech-niques that exploit the BA-POMDP structure toimprove the efficiency of BA-POMCP along withproof of their convergence.

BibTeX Entry

@article{Katt18arxiv,
    title =     {Learning in {POMDPs} with {Monte Carlo} Tree Search},
    author =    {Sammie Katt and Frans A. Oliehoek and Christopher Amato},
  journal = {ArXiv e-prints},
archivePrefix = "arXiv",
   eprint = {1806.05631},
 primaryClass = "cs.AI",
 keywords = {Computer Science - Artificial Intelligence, Computer Science - Learning},
    keywords =   {nonrefereed, arxiv},
     year = 2018,
    month = jun,
    wwwnote =  {Version of our ICML paper including all proofs, also available from  <a href="https://arxiv.org/abs/1806.05631">arXiv</a>.},
    abstract={
The POMDP is a powerful framework for reason-
ing under outcome and information uncertainty,
but constructing an accurate POMDP model is
difficult. Bayes-Adaptive Partially Observable
Markov Decision Processes (BA-POMDPs) ex-
tend POMDPs to allow the model to be learned
during execution. BA-POMDPs are a Bayesian
RL approach that, in principle, allows for an
optimal trade-off between exploitation and ex-
ploration. Unfortunately, BA-POMDPs are cur-
rently impractical to solve for any non-trivial do-
main. In this paper, we extend the Monte-Carlo
Tree Search method POMCP to BA-POMDPs
and show that the resulting method, which we
call BA-POMCP, is able to tackle problems that
previous solution methods have been unable to
solve. Additionally, we introduce several tech-
niques that exploit the BA-POMDP structure to
improve the efficiency of BA-POMCP along with
proof of their convergence.
    }
}

Generated by bib2html.pl (written by Patrick Riley) on Tue Jun 25, 2024 12:39:45 UTC