Editor's note: Data scientist Shayaan Jagtap used Mario and Pokémon as examples to explain what kinds of problems the current AI is not good at.
You probably heard that machines can play games at a superhuman level. These machines may be explicitly programmed to respond to set inputs and give set outputs, or they may learn and evolve on their own, responding to the same input in different ways, hoping to find the best response.
Some famous examples:
AlphaZero, after 24 hours of training, became the most powerful chess player on the planet.
AlphaGo, the famous Go robot, defeated world-class chess players Li Shidong and Ke Jie.
These games are very complicated, and training the above machines requires careful combination of complex algorithms, repeated simulations, and a lot of time. This article will focus on MarI/O and why we can’t use similar methods to clear Pokemon games.
In this regard, there are three key differences between Mario and Pokémon:
Target number
Branching factor
Global optimization and local optimization
Target number
The way of machine learning is to optimize a certain objective function. Whether it is maximizing the reward function (reinforcement learning), fitness function (genetic algorithm), or minimizing the cost function (supervised learning), the goal is similar: to get the best possible score.
Mario has only one goal: to reach the end of this level. Simply put, before death, the farther to the right you reach, the better your performance. This is a single objective function, and the ability of the model can be directly measured by this number.
The goals of Pokémon... there are many. Defeat the elite level 4? Capture all Pokémon? Train the strongest team? All of the above? Or is it another completely different goal?
We need to define not only what is the final goal, but also what the progress looks like? In this way, at any moment, each action among a large number of possible choices can be matched with rewards or losses.
This leads to the next topic.
Branching factor
Simply put, the branching factor is the number of possible choices that can be made at any step. The average branch factor of chess is 35 and that of Go is 250. In addition, for each step in the future, there are (branch factor) options that need to be evaluated.
In Mario, either go left, or right, or take off, or do nothing. The number of options that the machine needs to evaluate is small. At the same time, in terms of computing power, the smaller the branching factor, the more steps the robot can predict.
Pokémon is an open world game, which means that there are plenty of choices at any given moment. Simply up, down, left, and right cannot effectively calculate the number of branching factors. Instead, we need to look at the next meaningful action. The next action is to enter the battle, talk to the NPC, or enter the mini map on the left/right/upper/bottom? As the game progresses, the range of possible choices becomes wider and wider.
To create a machine that can find the best combination of options, short-term and long-term goals need to be considered, which leads to the last theme.
Global optimization and local optimization
Local optimization and global optimization include both spatial and temporal levels. Short-term goals and surrounding geographic areas belong to the local area, while long-term goals and larger areas such as cities and the whole map belong to the overall situation.
Breaking down each step can be a way to resolve Pokémon problems. How to locally optimize from point A to point B is easy, but deciding which destination is the best point B is a much more difficult problem. Greedy algorithms cannot work here, because local optimal decisions do not necessarily lead to global optimal.
The Mario map is small and linear. But Pokémon has an intricate, non-linear map. In order to achieve high-level goals, the current priority will change over time, and it is not an easy task to transform the global goal into a priority local optimization problem. This is not something that our current model has enough capacity to handle.
the last point
From the perspective of robots, Pokémon is not a game. The robots are all specialized. When you encounter an NPC to fight, the robots that help you move on the map can't do anything about it-these are two completely different tasks.
During the combat phase, there are many options for each round. Choosing how to move, which Pokémon to switch to, and when to use different items is itself a complex optimization problem. I saw an article on how to create a battle simulator. It was considered very thoughtful. Without considering the use of items, which is a key factor in determining the outcome of the battle, the complexity is already staggering.
We are currently able to create robots that can defeat us in our own games, and we should be happy for that. These games are very complicated in mathematics, but simple in objective. With the advancement of AI technology, we will create robots that can solve real-world problems with increasing influence. These robots will solve real-world problems by learning complex optimization problems on their own. You can rest assured that there are still many things that we are better at than machines, including the games we played in childhood—at least so far. Thanks for reading!
Indoor Patch Cord,Armored Patch Cord,Simplex Armoured Patch Cord,Duplex Armoured Patch Cord,Outdoor Duplex Armoured Patch Cable
ShenZhen JunJin Technology Co.,Ltd , https://www.jjtcl.com