Researchers have a new way to evaluate the intelligence of an AI model: put it in the game Minecraft without information about its environment and see how well it plays.
Minecraft is not only the best-selling video game in history, it could also be the key to creating adaptive artificial intelligence (AI) models that can solve a variety of problems in much the same way as humans do, writes New Scientist.
Stephen James of the University of the Witwatersrand in South Africa and his colleagues developed a benchmark test in Minecraft to measure the general intelligence of AI models. MinePlanner evaluates an AI’s ability to ignore irrelevant details when solving a complex problem with many steps.
James says that in many cases, AI training involves “cheating” by giving the model all the data it needs to learn how to do a job, and nothing else. This approach makes sense if you want to create software to perform a specific task, such as predicting the weather or folding proteins, but not if you are trying to create artificial general intelligence (AGI).
Future AI models will need to solve complex problems, and James hopes MinePlanner will guide that research. AI working to solve a problem in a game will see terrain, foreign objects, and other details that are not necessarily needed to solve the problem and should be ignored. He will have to study the environment and figure out on his own what is needed and what is not needed.
MinePlanner consists of 15 construction tasks, each with easy, medium and hard levels – for a total of 45 tasks. To complete each task, the AI may need to take intermediate steps – for example, building a ladder to place blocks at a certain height. This requires the AI to be able to look away from the problem and plan further actions to achieve the goal.
Meanwhile, in experiments with modern planning AI models ENHSP and Fast Downward—open source programs designed to perform sequential operations to achieve a common goal—neither model was able to solve a single complex problem. Fast Downward was only able to solve one of the intermediate problems and five easy problems, while ENHSP performed slightly better, solving all but one of the easy problems and all but two of the intermediate problems.