Deepmind creates AI game king to challenge the strongest chess and card AI with amazing combat effectiveness
2021-12-22
Deepmind, the top AI Laboratory of Google's parent company alphabet, has become popular all over the world because its AI system alphago defeated the top human go player and alphastar won StarCraft 2. This week, it unveiled a new game AI system. Different from the previously developed game system, deepmind's new AI player of games is the first AI algorithm that can achieve powerful performance in complete information games and incomplete information games. Complete information games, such as Chinese go, chess and other board games, and incomplete information games, such as poker. This is an important step towards a truly universal AI algorithm that can learn in any environment. Player of game plays against top AI agents in two complete information games, chess and go, and two incomplete information games, Texas poker and Scotland Yard. From the experimental results, deepmind said that the performance of player of games in complete information games has reached the level of "human top amateur", but if the same resources are given, the performance of the algorithm may be significantly weaker than that of special game algorithms such as alphazero. In two types of incomplete information games, player of games has defeated the most advanced AI agents. Paper link: https://arxiv.org/pdf/2112.03178.pdf 01. AI systems such as dark blue and alphago are only good at playing one game Computer programs have long challenged human gamers. In the 1950s, IBM scientist Arthur L. Samuel developed a checkers program to continuously improve its functions through self playing. This research inspired many people and popularized the term "machine learning". Since then, the game AI system has developed all the way. In 1992, TD Gammon developed by IBM achieved the master level in backgammon through self playing; In 1997, IBM deep blue defeated Kasparov, the then world chess king, in the chess competition; In 2016, the AI system alphago developed by deepmind beat the world go champion Li Shishi in the go competition ▲ IBM dark blue system vs world chess king Kasparov One thing these AI systems have in common is that they all focus on one game. For example, Samuel's program and alphago can't play chess, and IBM's dark blue can't play go. Subsequently, alphazero, alphago's successor, drew inferences from one example. It is proved that by simplifying alphago's method, with the least human knowledge, a single algorithm can master three different complete information games. However, alphazero still can't play poker, and it's unclear whether it can play incomplete information games. The methods of realizing super poker AI are very different. Poker games rely on the reasoning of game theory to ensure the effective hiding of personal information. The training of many other large-scale game AI is inspired by game theory reasoning and search, including hanabi card game AI, the resistance board game AI, bridge bridge game AI, alphastar StarCraft II game AI, etc. ▲ in January 2019, alphastar vs. StarCraft II professionals Each progress here is still based on a game and uses some domain specific knowledge and structure to achieve strong performance. Alphazero and other systems developed by deepmind are good at complete information games such as chess, while algorithms such as deepstack developed by Albert University of Canada and libratus developed by Carnegie Mellon University perform well in incomplete information games such as poker. In this regard, deepmind has developed a new algorithm player of games (POG), which uses less domain knowledge and achieves strong performance through self play, search and game theory reasoning. 02. More general algorithm POG: good at board and poker games Whether it is road planning to solve traffic congestion, contract negotiation, communication with customers and other interactive tasks, people's preferences should be considered and balanced, which is very similar to the game strategy. AI systems may benefit from coordination, cooperation and interaction between groups or organizations. Systems like player of games can infer other people's goals and motives and make them cooperate successfully with others. Playing a complete information game requires considerable foresight and planning. Players must deal with what they see on the chessboard and decide what their opponents may do, while trying to achieve the ultimate goal of victory. Incomplete information game requires players to consider the hidden information and think about how to win in the next step, including possible bluff or team up against opponents. Deepmind said that player of games is the first "general and sound search algorithm", which has achieved strong performance in complete and incomplete information games. Player of games (POG) is mainly composed of two parts: 1) a new growth tree counterfactual regret minimization (gt-cfr); 2) a reasonable self game for training value strategy network through game results and recursive sub search. ▲ player of Games training process: actor collects data through self game, and trainer runs independently on the distributed network In complete information games, alphazero is more powerful than player of games, but in incomplete information games, alphazero is not so easy. Player of games is very versatile, but not all games can be played. Martin Schmid, a senior research scientist at deepmind who participated in the research, said that the AI system needs to consider all possible perspectives of each player in the game situation. Although there is only one perspective in the complete information game, there may be many such perspectives in the incomplete information game. For example, in the poker game, there are about 2000 perspectives. In addition, unlike the higher-order muzero algorithm developed by deepmind after alphazero, player of games also needs to understand the rules of the game, and muzero can quickly master the rules of the complete information game without being informed of the rules. In its research, deepmind evaluated the performance of player of games in chess, go, Texas poker and strategy reasoning board game Scotland Yard using Google tpuv4 acceleration chipset. ▲ in the abstract picture of Scotland Yard, player of games can continue to win In the go game, alphazero and player of games played 200 games, each holding black chess 100 times and white chess 100 times. In the chess game, deepmind let player of games compete with top systems such as gnugo, pachi, stockfish and alphazero. ▲ the relative ELO table of different agents. Each agent will play 200 games with other agents In chess and go, player of games proved to be stronger than stockfish and pachi in some configurations. It won 0.5% against the strongest alphazero. Despite the disastrous defeat in the game against alphazero, deepmind believes that the performance of player of games has reached the level of "human top amateur", and may even reach the professional level. Player of games plays against the publicly available slumbot in the Texas poker game. The algorithm also competed with pimbot developed by Joseph Antonius Maria nijssen in Scotland Yard. ▲ competition results of different agents in Texas poker and Scotland Yard games The results show that player of games is a better Texas poker and Scotland Yard player. In the war with slumbot, the algorithm wins an average of 7 million large blind notes (MBB / hand) per hand, and MBB / hand is the average number of large blind notes per 1000 hands. Meanwhile, at Scotland Yard, deepmind said that although pimbot had more opportunities to search for winning tricks, player of games beat it "significantly". 03. Research key challenge: training costs are too high Schmidt believes that player of games is a big step towards a truly universal game system. The general trend of the experiment is that with the increase of computing resources, the player of games algorithm is used to ensure a better approximation of the minimization optimal strategy. Schmidt expects that this method will expand in the foreseeable future. "People will think that applications that benefit from alphazero may also benefit gamers." "Making these algorithms more general is an exciting study," he said Of course, the method of large-scale computing will put startups, academic institutions and other organizations with less resources at a disadvantage. This is especially true in the language field. Large models such as openai's gpt-3 have achieved leading performance, but they usually require millions of dollars in resource requirements, which is far beyond the budget of most research groups. Even in a company with deep mind, the cost sometimes exceeds the acceptable level. For alphastar, the company's researchers deliberately did not try a variety of methods to build key components, because executives think the training cost is too high. According to the performance documents disclosed by deepmind, it made its first profit last year, with an annual revenue of 826 million pounds (about 6.9 billion yuan) and a profit of 43.8 million pounds (about 367 million yuan). From 2016 to 2019, deepmind lost a total of 1.355 billion pounds (about 11.3 billion yuan). It is estimated that the training cost of alphazero is as high as tens of millions of dollars. Deepmind did not disclose the research budget of player of games, but considering that the training steps of each game range from hundreds of thousands to millions, this budget is unlikely to be low. 04. Conclusion: game AI is helping to break through cognitive and reasoning challenges At present, game AI still lacks obvious commercial applications, and deepmind's consistent concept is to explore and break through the unique challenges faced by cognitive and reasoning abilities. In recent decades, games have stimulated the autonomous learning of AI, which provides a driving force for computer vision, automatic driving cars and Natural Language Processing. As the research turns from games to other more commercial fields, such as application recommendation, data center cooling optimization, weather forecasting, material modeling, mathematics, health care and atomic energy computing, the value of game AI research to search, learning and game reasoning becomes more and more prominent. "An interesting question is whether this level of game can be realized with less computing resources." There is no clear answer to the question mentioned at the end of the player of games paper. (Xinhua News Agency)
Edit:Li Ling Responsible editor:Chen Jie
Source:zhidxcom
Special statement: if the pictures and texts reproduced or quoted on this site infringe your legitimate rights and interests, please contact this site, and this site will correct and delete them in time. For copyright issues and website cooperation, please contact through outlook new era email:lwxsd@liaowanghn.com