Publications• Sorted by Date • Classified by Publication Type • Classified by Research Category • Best-response Play in Partially Observable Card GamesFrans A. Oliehoek, Matthijs T. J. Spaan, and Nikos Vlassis. Best-response Play in Partially Observable Card Games. In Proceedings of the 14th Annual Machine Learning Conference of Belgium and the Netherlands (Benelearn), pp. 45–50, February 2005. DownloadAbstractWe address the problem of how to play optimally against a fixedopponent in a two-player card game with partial information likepoker. A game theoretic approach to this problem would specify a pairof stochastic policies that are best-responses to each other, i.e., aNash equilibrium. Although such a Nash-optimal policy guarantees alower bound to the attainable payoff against any opponent, it may not necessarily be optimal against a fixed opponent. We show here that if the opponent's policy is fixed (either known or estimated by repeatedplay), then we can model the problem as a partially observable Markovdecision process (POMDP) from the perspective of one agent, and solveit by dynamic programming. In particular, for a large class of cardgames including poker, the derived POMDP consists of a finite numberof belief states and it can be solved exactly. The resulting policy isguaranteed to be optimal even against a Nash-optimal policy. Weprovide experimental results to support our claims, using a simplified8-card poker game in which Nash-policies can be computed efficiently. BibTeX Entry@InProceedings{Oliehoek05Benelearn, author = {Frans A. Oliehoek and Matthijs T. J. Spaan and Nikos Vlassis}, title = {Best-response Play in Partially Observable Card Games}, booktitle = Benelearn05, month = feb, year = 2005, pages = {45--50}, url = {https://research.utwente.nl/files/5088581/benelearn05.pdf}, abstract = { We address the problem of how to play optimally against a fixed opponent in a two-player card game with partial information like poker. A game theoretic approach to this problem would specify a pair of stochastic policies that are best-responses to each other, i.e., a Nash equilibrium. Although such a Nash-optimal policy guarantees a lower bound to the attainable payoff against any opponent, it may not necessarily be optimal against a fixed opponent. We show here that if the opponent's policy is fixed (either known or estimated by repeated play), then we can model the problem as a partially observable Markov decision process (POMDP) from the perspective of one agent, and solve it by dynamic programming. In particular, for a large class of card games including poker, the derived POMDP consists of a finite number of belief states and it can be solved exactly. The resulting policy is guaranteed to be optimal even against a Nash-optimal policy. We provide experimental results to support our claims, using a simplified 8-card poker game in which Nash-policies can be computed efficiently.} }
Generated by
bib2html.pl
(written by Patrick Riley) on
Mon Oct 07, 2024 14:17:04 UTC |