Publications• Sorted by Date • Classified by Publication Type • Classified by Research Category • Sufficient Plan-Time Statistics for Decentralized POMDPsFrans A. Oliehoek. Sufficient Plan-Time Statistics for Decentralized POMDPs. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI), pp. 302–308, 2013. DownloadAbstractOptimal decentralized decision making in a team of cooperative agents as formalized in the framework of Decentralized POMDPs is a notoriously hard problem. A major obstacle is that the agents do not have access to a sufficient statistics during execution, which means that agents need to base their actions on their histories of observations. A consequence is that even during off-line planning the choice of decision rules for different stages is tightly interwoven: decisions of earlier stages affect how to act optimally at later stages, and the optimal value function for a stage is known to have a dependence on the decisions made up to that point. This paper makes a contribution to the theory of decentralized POMDPs by showing how this dependence on the `past joint policy' can be replaced by a probability distribution over histories and potentially states. That is, it introduces sufficient statistics for the past joint policy during the optimal planning process. These results are extended to the case of k-steps delayed communication. We investigate the practical implications in a number of benchmark problems and discuss future avenues of research opened by these contributions. BibTeX Entry@inproceedings{Oliehoek13IJCAI, author = {Frans A. Oliehoek}, title = {Sufficient Plan-Time Statistics for Decentralized {POMDPs}}, booktitle = IJCAI13, year = 2013, pages = {302--308}, note = {}, url = {https://www.aaai.org/ocs/index.php/IJCAI/IJCAI13/paper/view/6610}, abstract = { Optimal decentralized decision making in a team of cooperative agents as formalized in the framework of Decentralized POMDPs is a notoriously hard problem. A major obstacle is that the agents do not have access to a sufficient statistics during execution, which means that agents need to base their actions on their histories of observations. A consequence is that even during off-line planning the choice of decision rules for different stages is tightly interwoven: decisions of earlier stages affect how to act optimally at later stages, and the optimal value function for a stage is known to have a dependence on the decisions made up to that point. This paper makes a contribution to the theory of decentralized POMDPs by showing how this dependence on the `past joint policy' can be replaced by a probability distribution over histories and potentially states. That is, it introduces sufficient statistics for the past joint policy during the optimal planning process. These results are extended to the case of k-steps delayed communication. We investigate the practical implications in a number of benchmark problems and discuss future avenues of research opened by these contributions. } }
Generated by
bib2html.pl
(written by Patrick Riley) on
Mon Oct 07, 2024 14:17:04 UTC |