Publications• Sorted by Date • Classified by Publication Type • Classified by Research Category • Learning in POMDPs with Monte Carlo Tree Search Sammie Katt, Frans A. Oliehoek, and Christopher Amato. Learning in POMDPs with Monte Carlo Tree Search. ArXiv e-prints, June 2018. DownloadAbstractThe POMDP is a powerful framework for reason-ing under outcome and information uncertainty,but constructing an accurate POMDP model isdifficult. Bayes-Adaptive Partially ObservableMarkov Decision Processes (BA-POMDPs) ex-tend POMDPs to allow the model to be learnedduring execution. BA-POMDPs are a BayesianRL approach that, in principle, allows for anoptimal trade-off between exploitation and ex-ploration. Unfortunately, BA-POMDPs are cur-rently impractical to solve for any non-trivial do-main. In this paper, we extend the Monte-CarloTree Search method POMCP to BA-POMDPsand show that the resulting method, which wecall BA-POMCP, is able to tackle problems thatprevious solution methods have been unable tosolve. Additionally, we introduce several tech-niques that exploit the BA-POMDP structure toimprove the efficiency of BA-POMCP along withproof of their convergence. BibTeX Entry@article{Katt18arxiv,
title = {Learning in {POMDPs} with {Monte Carlo} Tree Search},
author = {Sammie Katt and Frans A. Oliehoek and Christopher Amato},
journal = {ArXiv e-prints},
archivePrefix = "arXiv",
eprint = {1806.05631},
primaryClass = "cs.AI",
keywords = {Computer Science - Artificial Intelligence, Computer Science - Learning},
keywords = {nonrefereed, arxiv},
year = 2018,
month = jun,
wwwnote = {Version of our ICML paper including all proofs, also available from <a href="https://arxiv.org/abs/1806.05631">arXiv</a>.},
abstract={
The POMDP is a powerful framework for reason-
ing under outcome and information uncertainty,
but constructing an accurate POMDP model is
difficult. Bayes-Adaptive Partially Observable
Markov Decision Processes (BA-POMDPs) ex-
tend POMDPs to allow the model to be learned
during execution. BA-POMDPs are a Bayesian
RL approach that, in principle, allows for an
optimal trade-off between exploitation and ex-
ploration. Unfortunately, BA-POMDPs are cur-
rently impractical to solve for any non-trivial do-
main. In this paper, we extend the Monte-Carlo
Tree Search method POMCP to BA-POMDPs
and show that the resulting method, which we
call BA-POMCP, is able to tackle problems that
previous solution methods have been unable to
solve. Additionally, we introduce several tech-
niques that exploit the BA-POMDP structure to
improve the efficiency of BA-POMCP along with
proof of their convergence.
}
}
Generated by
bib2html.pl
(written by Patrick Riley) on
Thu Nov 06, 2025 10:14:50 UTC |