DanZero: Mastering GuanDan Game with Reinforcement Learning
The use of artificial intelligence (AI) in card games has been a widely researched topic in the field of AI for an extended period. Recent advancements have led to AI programs exhibiting expert-level gameplay in complex card games such as Mahjong, DouDizhu, and Texas Hold’em. This paper aims to develop an AI program, named DanZero, for GuanDan, an exceptionally complex card game that involves four players competing and cooperating in a long process to upgrade their level quickly. Developing AI for GuanDan is challenging due to its large state and action space, long episode length, and uncertainty in the number of players. To address these challenges, we propose DanZero, the first AI program for GuanDan, that employs reinforcement learning using a distributed framework for training. Our framework consists of two processes: the Actor Process and the Learner Process. In the Actor Process, we design state features and generate samples through agents’ self-play. In the Learner Process, we update the model using the Deep Monte-Carlo Method. We trained DanZero for 30 days, utilizing 160 CPUs and 1 GPU to develop the program successfully. We compared DanZero’s performance with eight baseline AI programs based on heuristic rules, and our results indicate DanZero’s exceptional performance. We further tested DanZero with human players and demonstrated its ability to perform at a human level. The code for DanZero can be found in the supplementary material.