Publications• Sorted by Date • Classified by Publication Type • Classified by Research Category • Learning in POMDPs with Monte Carlo Tree Search Sammie Katt, Frans A. Oliehoek, and Christopher Amato. Learning in POMDPs with Monte Carlo Tree Search. ArXiv e-prints, June 2018. DownloadAbstractThe POMDP is a powerful framework for reason-ing under outcome and information uncertainty,but constructing an accurate POMDP model isdifficult. Bayes-Adaptive Partially ObservableMarkov Decision Processes (BA-POMDPs) ex-tend POMDPs to allow the model to be learnedduring execution. BA-POMDPs are a BayesianRL approach that, in principle, allows for anoptimal trade-off between exploitation and ex-ploration. Unfortunately, BA-POMDPs are cur-rently impractical to solve for any non-trivial do-main. In this paper, we extend the Monte-CarloTree Search method POMCP to BA-POMDPsand show that the resulting method, which wecall BA-POMCP, is able to tackle problems thatprevious solution methods have been unable tosolve. Additionally, we introduce several tech-niques that exploit the BA-POMDP structure toimprove the efficiency of BA-POMCP along withproof of their convergence. BibTeX Entry@article{Katt18arxiv, title = {Learning in {POMDPs} with {Monte Carlo} Tree Search}, author = {Sammie Katt and Frans A. Oliehoek and Christopher Amato}, journal = {ArXiv e-prints}, archivePrefix = "arXiv", eprint = {1806.05631}, primaryClass = "cs.AI", keywords = {Computer Science - Artificial Intelligence, Computer Science - Learning}, keywords = {nonrefereed, arxiv}, year = 2018, month = jun, wwwnote = {Version of our ICML paper including all proofs, also available from <a href="https://arxiv.org/abs/1806.05631">arXiv</a>.}, abstract={ The POMDP is a powerful framework for reason- ing under outcome and information uncertainty, but constructing an accurate POMDP model is difficult. Bayes-Adaptive Partially Observable Markov Decision Processes (BA-POMDPs) ex- tend POMDPs to allow the model to be learned during execution. BA-POMDPs are a Bayesian RL approach that, in principle, allows for an optimal trade-off between exploitation and ex- ploration. Unfortunately, BA-POMDPs are cur- rently impractical to solve for any non-trivial do- main. In this paper, we extend the Monte-Carlo Tree Search method POMCP to BA-POMDPs and show that the resulting method, which we call BA-POMCP, is able to tackle problems that previous solution methods have been unable to solve. Additionally, we introduce several tech- niques that exploit the BA-POMDP structure to improve the efficiency of BA-POMCP along with proof of their convergence. } }
Generated by
bib2html.pl
(written by Patrick Riley) on
Mon Oct 07, 2024 14:17:04 UTC |