博弈特点:两人零和博弈;回合制博弈;不完美信息博弈;非合作博弈
潜在应用:推荐系统、能源管理、智慧金融、医疗健康、自动驾驶、指挥决策
平台简介:游戏作为人工智能研究测试平台有着悠久的历史。最近,使用博弈论推理和学习的方法在不完美信息(特别是扑克类游戏)游戏中取得了显著的成功。不完美信息博弈是一种信息不对称的博弈。与完美信息博弈相比,不完美信息博弈在生活中更为常见。德州扑克作为不完美信息博弈的典型代表,近年也取得突破。Libratus和Deepstack的巨大成功引起了研究者的高度重视。但是以Libratus和Deepstack为代表的高水平德州扑克AI未对外开放代码,同时其相关理论晦涩难懂、技术细节少、模型训练开销大等原因,致使高水平德州扑克AI的复现难度较大,这在很大程度上限制了不完美信息博弈理论与技术的研究和发展。
针对模型训练难的问题,我们研发了德州扑克训练评估平台,该平台基于微服务框架打造,集模型训练、模型评估、人机对抗于一体,为开展不完美信息博弈的研究提供了极大便利,帮助智能体开发人员快速掌握智能体完整开发流程和技巧,快速实现AI开发与训练。使研究人员能专注于算法本身的研究,进而提高新算法研究效率。
参考文献:
[1] Noam Brown and Tuomas Sandholm. Superhuman ai for heads-up no-limit poker:
Libratus beats top professionals. Science, 359(6374):418–
424, 2018.
[2] Noam Brown and T. Sandholm. Solving imperfect-information games via discounted regret minimization. In AAAI, 2019.
[6] Noam Brown, Tuomas Sandholm, and Brandon Amos. Depth-limited solving for imperfectinformation games. arXiv preprint arXiv:1805.08195, 2018.
[7] Gabriele Farina, Christian Kroer, Noam Brown, and T. Sandholm. Stable-predictive
optimistic counterfactual regret minimization. ArXiv, abs/1902.04982, 2019.
[8] Gabriele Farina, Christian Kroer, and T. Sandholm. Optimistic regret minimization for extensive-form games via dilated distance-generating functions. In NeurIPS, 2019.
[9] Gabriele Farina, Christian Kroer, and T. Sandholm. Regret circuits: Composability of regret minimizers. In ICML, 2019.
[10] M. Hartley. Multi-agent counterfactual regret minimization for partial-information collaborative games. 2017.
[11] Hui Li, Kailiang Hu, Zhibang Ge, Tao Jiang, Yuan Qi, and L. Song. Double neural counterfactual regret minimization. ArXiv, abs/1812.10607, 2020.
[12] Kai Li, Hang Xu, Meng Zhang, Enmin Zhao, Zhe Wu, Junliang Xing, and Kaiqi Huang. Openholdem: An open toolkit for large-scale imperfectinformation game research. arXiv preprint arXiv:2012.06168, 2020.
[13] Matej Moravcık, Martin Schmid, Neil Burch, Viliam Lisy, Dustin Morrill, Nolan Bard, Trevor Davis, Kevin Waugh, Michael Johanson, and Michael Bowling. Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science, 356(6337):508–513, 2017.
[14] Martin Schmid, Neil Burch, Marc Lanctot, Matej Moravcik, Rudolf Kadlec, and Michael H. Bowling. Variance reduction in monte carlo counterfactual regret minimization (vr-mccfr) for extensive form games using baselines. ArXiv, abs/1809.03057, 2019.
[15] David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
[16] Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. Regret minimization in games with incomplete information. In Advances in Neural Information Processing Systems 20, volume 20, pages 1729–1736, 2007.