A bot that watched 70,000 hours of Minecraft could unlock AI’s next big thing
OpenAI has built the best Minecraft-playing bot yet by making it watch 70,000 hours of video of people playing the popular computer game. This new technique could be used to train machines to perform a wide variety of tasks. It can be done by watching videos on YouTube, which is a huge and untapped source for training data. The Minecraft AI was able to learn complex sequences of keyboard clicks and mouse clicks in order to complete tasks such as cutting down trees and crafting tools. It’s the first bot that can craft so-called diamond tools, a task that typically takes good human players 20 minutes of high-speed clicking–or around 24,000 actions.
The result is a breakthrough for a technique known as imitation learning, in which neural networks are trained how to perform tasks by watching humans do them. It can be used to train AI to control robot arms drive cars and navigate webpages.
There is a lot of video online that shows people performing different tasks. The researchers hope to tap into this resource for imitation learning what GPT-3 did for large language models . Bowen Baker, OpenAI’s one-of-the team behind the new Minecraft bot, says that the GPT-3 paradigm has seen a rise in popularity. “In the past few years we have seen the rise of large models trained on huge swathes the internet. We see amazing capabilities.” “A large portion of that is because our models are modeling what humans do when they visit the internet .”.
The problem with current approaches to imitation learning, is that video demonstrations must be labeled at every step: Doing this action makes this happen, do that action makes that happen, etc. This is a tedious task, so datasets of this nature are often small. Baker and his colleagues sought a way to make the millions of videos online into a new dataset.
The team’s approach to imitation learning, Video Pre-Training or VPT (Video Pre-Training), bypasses the bottleneck in imitation by training another neural network that can automatically label videos. They first hired crowdworkers to play Minecraft, and recorded their keyboard and mouse clicks alongside the video from their screens. This gave the researchers 2000 hours of annotated Minecraft play, which they used to train a model to match actions to onscreen outcome. For example, clicking a mouse button in a situation will cause the character to swing its axe.
The next step was to use this model to generate action labels for 70,000 hours of unlabelled video taken from the internet and then train the Minecraft bot on this larger dataset. Video can be used as a training resource. Peter Stone, the executive director of Sony AI America who previously worked in imitation learning, says that Video has a lot potential.
Imitation learning is an alternative to reinforcement learning, in which a neural network learns to perform a task from scratch via trial and error. This technique is responsible for many of the most significant AI breakthroughs over the past few years. It has been used to train models that can beat humans at games, control a fusion reactor, and discover a faster way to do fundamental math.
The problem with reinforcement learning is that it works best when there is a clear goal. Random actions can lead to accidental successes. Reinforcement learning algorithms reward accidental successes to increase their likelihood of happening again.
But Minecraft is a game with no clear goal. Players can do whatever they want, roaming through a computer-generated universe, mining different materials, and combining them to create different objects.
Minecraft is an open-ended environment that allows for the training of AI. Baker was one of the researchers behind Hide & Seek, a project in which bots were let loose in a virtual playground where they used reinforcement learning to figure out how to cooperate and use tools to win simple games. The bots quickly outgrew their environment. Baker says that the agents “almost took over the universe, there was nothing for them to do.” “We wanted it to grow and we thought Minecraft would be a great place to do so.”
They’re not the only ones. Minecraft is becoming a key testbed for new AI techniques. MineDojo, a Minecraft environment with dozens of predesigned challenges, won an award at this year’s NeurIPS, one of the biggest AI conferences.
Using VPT, OpenAI’s bot was able to carry out tasks that would have been impossible using reinforcement learning alone, such as crafting planks and turning them into a table, which involves around 970 consecutive actions. They found that reinforcement learning and imitation learning combined produced the best results. Taking a bot trained with VPT and fine-tuning it with reinforcement learning allowed it to carry out tasks involving more than 20,000 consecutive actions.
The researchers claim that their approach could be used to train AI to carry out other tasks. It could be used for bots that use a keyboard or mouse to navigate websites, book flights, or order groceries online. But in theory it could be used to train robots to carry out physical, real-world tasks by copying first-person video of people doing those things. Stone says that it is possible.
Matthew Gudzial, a Canadian university professor who used videos to teach AI rules of games such as Super Mario Bros., doesn’t think it will happen anytime soon. You can perform actions in games like Minecraft or Super Mario Bros. by pressing buttons. Actions in the real world are more complex and harder to learn for machines. Gudzial says, “It unlocks many new research problems.”
This work is yet another testament to the power of scaling models and training on large datasets to get high performance,” says Natasha Jaques who works on multi-agent reinforcement Learning at Google and the University of California in Berkeley. Jaques says Large internet-sized data sets will unlock new capabilities for AI.
Large internet-sized data sets will certainly unlock new capabilities for AI, says Jaques. It’s probably the best Minecraft-playing bot yet, says Baker: “But with more data and bigger models I would expect it to feel like you’re watching a human playing the game, as opposed to a baby AI trying to mimic a human.”
I’m a journalist who specializes in investigative reporting and writing. I have written for the New York Times and other publications.