Publications

• Sorted by Date • Classified by Publication Type • Classified by Research Category •

Coevolutionary Nash in Poker Games

Frans A. Oliehoek, Nikos Vlassis, and Edwin de Jong. Coevolutionary Nash in Poker Games. In Proceedings of the 17th Belgian-Dutch Conference on Artificial Intelligence (BNAIC), pp. 188–193, October 2005.

Download

pdf [139.2kB] ps.gz [128.2kB]

Abstract

We address the problem of learning good policies in poker games. Theclassical game theoretic approach to this problem specifies a Nashequilibrium solution, i.e., a pair of secure mixed (randomized)policies. We describe a new approach for calculating such securepolicies based on coevolution. Here, populations of pure policies forboth players are simultaneously evolved by repeated comparisonsagainst each other, and secure mixed policies are computed from bothpopulations by linear programming. The search heuristic for adding newcandidate pure policies involves computing a best-response pure policy(by solving a POMDP) that provides a worst-case payoff for each mixedpolicy. We provide experimental results suggesting that a Nashequilibrium policy can be approximated in relatively few iterations,thereby producing mixed policies with relatively small support. Weconclude that this is a promising direction of research and providedirections for future work.

BibTeX Entry

@InProceedings{Oliehoek05BNAIC,
    author =       {Frans A. Oliehoek and Nikos Vlassis and Edwin de Jong},
    title =        {Coevolutionary {N}ash in Poker Games},
    booktitle =    BNAIC05,
    month =        oct,
    year =         2005,
    pages =        {188--193},
    abstract = 	 {
We address the problem of learning good policies in poker games. The
classical game theoretic approach to this problem specifies a Nash
equilibrium solution, i.e., a pair of secure mixed (randomized)
policies. We describe a new approach for calculating such secure
policies based on coevolution. Here, populations of pure policies for
both players are simultaneously evolved by repeated comparisons
against each other, and secure mixed policies are computed from both
populations by linear programming. The search heuristic for adding new
candidate pure policies involves computing a best-response pure policy
(by solving a POMDP) that provides a worst-case payoff for each mixed
policy. We provide experimental results suggesting that a Nash
equilibrium policy can be approximated in relatively few iterations,
thereby producing mixed policies with relatively small support. We
conclude that this is a promising direction of research and provide
directions for future work.}
}

Generated by bib2html.pl (written by Patrick Riley) on Tue Jun 25, 2024 12:39:45 UTC