Publications• Sorted by Date • Classified by Publication Type • Classified by Research Category • Learning Complex Policy Distribution with CEM Guided Adversarial HypernetworkShi Yuan Tang, Athirai A. Irissappane, Frans A. Oliehoek, and Jie Zhang. Learning Complex Policy Distribution with CEM Guided Adversarial Hypernetwork. In Proceedings of the Twentieth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 1308–1316, May 2021. Invited for JAAMAS fast track DownloadAbstractCross-Entropy Method (CEM) is a gradient-free direct policy searchmethod, which has greater stability and is insensitive to hyper-parameter tuning. CEM bears similarity to population-based evo-lutionary methods, but, rather than using a population it uses adistribution over candidate solutions (policies in our case). Usu-ally, a natural exponential family distribution such as multivariateGaussian is used to parameterize the policy distribution. Using amultivariate Gaussian limits the quality of CEM policies as thesearch becomes confined to a less representative subspace. Weaddress this drawback by using an adversarially-trained hypernet-work, enabling a richer and complex representation of the policydistribution. To achieve better training stability and faster conver-gence, we use a multivariate Gaussian CEM policy to guide ouradversarial training process. Experiments demonstrate that our ap-proach outperforms state-of-the-art CEM-based methods by 15.8%in terms of rewards while achieving faster convergence. Resultsalso show that our approach is less sensitive to hyper-parametersthan other deep-RL methods such as REINFORCE, DDPG and DQN. BibTeX Entry@inproceedings{Tang21AAMAS, author= {Shi Yuan Tang and Athirai A. Irissappane and Frans A. Oliehoek and Jie Zhang}, title = {Learning Complex Policy Distribution with {CEM} Guided Adversarial Hypernetwork}, booktitle = AAMAS21, year = 2021, month = may, pages = {1308--1316}, keywords = {refereed}, note = {\textbf{Invited for JAAMAS fast track}}, abstract = { Cross-Entropy Method (CEM) is a gradient-free direct policy search method, which has greater stability and is insensitive to hyper- parameter tuning. CEM bears similarity to population-based evo- lutionary methods, but, rather than using a population it uses a distribution over candidate solutions (policies in our case). Usu- ally, a natural exponential family distribution such as multivariate Gaussian is used to parameterize the policy distribution. Using a multivariate Gaussian limits the quality of CEM policies as the search becomes confined to a less representative subspace. We address this drawback by using an adversarially-trained hypernet- work, enabling a richer and complex representation of the policy distribution. To achieve better training stability and faster conver- gence, we use a multivariate Gaussian CEM policy to guide our adversarial training process. Experiments demonstrate that our ap- proach outperforms state-of-the-art CEM-based methods by 15.8\% in terms of rewards while achieving faster convergence. Results also show that our approach is less sensitive to hyper-parameters than other deep-RL methods such as REINFORCE, DDPG and DQN. } }
Generated by
bib2html.pl
(written by Patrick Riley) on
Mon Oct 07, 2024 14:17:04 UTC |