Publications

Sorted by DateClassified by Publication TypeClassified by Research Category

Learning in POMDPs with Monte Carlo Tree Search

Sammie Katt, Frans A. Oliehoek, and Christopher Amato. Learning in POMDPs with Monte Carlo Tree Search. ArXiv e-prints, June 2018.
Version of our ICML paper including all proofs, also available from arXiv.

Download

pdf [596.8kB]  

Abstract

The POMDP is a powerful framework for reason-ing under outcome and information uncertainty,but constructing an accurate POMDP model isdifficult. Bayes-Adaptive Partially ObservableMarkov Decision Processes (BA-POMDPs) ex-tend POMDPs to allow the model to be learnedduring execution. BA-POMDPs are a BayesianRL approach that, in principle, allows for anoptimal trade-off between exploitation and ex-ploration. Unfortunately, BA-POMDPs are cur-rently impractical to solve for any non-trivial do-main. In this paper, we extend the Monte-CarloTree Search method POMCP to BA-POMDPsand show that the resulting method, which wecall BA-POMCP, is able to tackle problems thatprevious solution methods have been unable tosolve. Additionally, we introduce several tech-niques that exploit the BA-POMDP structure toimprove the efficiency of BA-POMCP along withproof of their convergence.

BibTeX Entry

@article{Katt18arxiv,
    title =     {Learning in {POMDPs} with {Monte Carlo} Tree Search},
    author =    {Sammie Katt and Frans A. Oliehoek and Christopher Amato},
  journal = {ArXiv e-prints},
archivePrefix = "arXiv",
   eprint = {1806.05631},
 primaryClass = "cs.AI",
 keywords = {Computer Science - Artificial Intelligence, Computer Science - Learning},
    keywords =   {nonrefereed, arxiv},
     year = 2018,
    month = jun,
    wwwnote =  {Version of our ICML paper including all proofs, also available from  <a href="https://arxiv.org/abs/1806.05631">arXiv</a>.},
    abstract={
The POMDP is a powerful framework for reason-
ing under outcome and information uncertainty,
but constructing an accurate POMDP model is
difficult. Bayes-Adaptive Partially Observable
Markov Decision Processes (BA-POMDPs) ex-
tend POMDPs to allow the model to be learned
during execution. BA-POMDPs are a Bayesian
RL approach that, in principle, allows for an
optimal trade-off between exploitation and ex-
ploration. Unfortunately, BA-POMDPs are cur-
rently impractical to solve for any non-trivial do-
main. In this paper, we extend the Monte-Carlo
Tree Search method POMCP to BA-POMDPs
and show that the resulting method, which we
call BA-POMCP, is able to tackle problems that
previous solution methods have been unable to
solve. Additionally, we introduce several tech-
niques that exploit the BA-POMDP structure to
improve the efficiency of BA-POMCP along with
proof of their convergence.
    }
}

Generated by bib2html.pl (written by Patrick Riley) on Mon Oct 07, 2024 14:17:04 UTC