DeepMind’s MuZero masters games while learning how to play them – TechCrunch

DeepMind It has made it a mission to show that not only can an AI actually be proficient in a game, it can also do so without stating the rules. Its newest AI agent, called MuZeroWith complex strategies like Go, Chess and Shogi, this is accomplished not only with visually simple games, but with visually complex Attic sport.

The success of the first AIIMS of DeepMind was at least partly due to an efficient navigation of huge decision trees that represent potential tasks in a game. In Go or Chess these trees are governed by very specific rules, such as where pieces can move, what happens when this piece does so, and so on.

The AI, which defeated the world champions in Go, AlphaGo, knew these rules and kept them (or perhaps in RAM) in mind, studying the game between and among human players, one of best practices and strategies Group created. The sequel, AlphaGo Zero, did so without human data, only playing against itself. Alfazero created the same AI model in 2018 with Go, Chess and Shogi, which could outlast all these games.

But in all of these cases AI was presented with a set of immutable, known rules for the game, around which an outline was designed to allow it to formulate its own strategies. Think about it: If you are told that a pawn can become a queen, you plan for it from the beginning, but if you need to find it out, you can develop a completely different strategy.

This supporting diagram shows what different models have achieved with different initial knowledge. Picture: Deepmind

As the company states in a blog post Regarding their new research, if AI is told the rules ahead of time, “it makes them difficult to mess up real-world problems that are usually complex and difficult to distill into simpler rules.” “

The company’s latest advance is Museiro, which not only plays the aforementioned games, but also plays a variety of attic games, and does so without providing any manuals. The final model learned all these games not only by doing their own (no human data) experiment, but also told the most basic rules.

Instead of using rules to find a best-case scenario (because it can’t be), Museiro learns to take into account every aspect of the game environment, seeing for himself whether it matters. In millions of games it not only learns the rules, but also the general value of a situation, general policies to move forward and a way to evaluate its own actions.

This latter ability helps it to learn from its mistakes, rewinding and reduing games to try different approaches that further improve the situation and policy values.

You may remember another DeepMind Creation Agent57 Excelled on the set of 57 Atari Games. The Museiro takes the best of that AI and combines it with the best of Alfazero. The Muséro differs from the earlier in that it does not model the entire playing environment, but focuses on the parts that influence its decision making, and subsequently its own use of it and its own set of rules on first knowledge Form the basis of the model.

Understanding the game world allows Musero to effectively plan its actions, even the game world, like many Atari games, partially random and visually complex. It pushes closer to an AI that can safely and intelligently interact with the real world, learning to understand the world around it without elaborating on every detail (although it is likely that some, such as “humans Do not crush, “There will be etching in the stone). As one of the researchers Told the BBCThe team is already looking at how the Mujero can improve video compression – apparently a much different problem than Ms. Pac-Man.

Mujero had a description Published today in the journal Nature.