When
humans face a complex challenge, we create a plan composed of
individual, related steps. Often, these plans are formed as natural
language sentences. This approach enables us to achieve our goal and
also adapt to new challenges, because we can leverage elements of
previous plans to tackle new tasks, rather than starting from scratch
each time.
Facebook
AI has developed a new method of teaching AI to plan effectively, using
natural language to break down complex problems into high-level plans
and lower-level actions. Our system innovates by using two AI models —
one that gives instructions in natural language and one that interprets
and executes them — and it takes advantage of the structure in natural
language in order to address unfamiliar tasks and situations. We’ve
tested our approach using a new real-time strategy game called
MiniRTSv2, and found it outperforms AI systems that simply try to
directly imitate human gameplay.
We’re now sharing our results which will be presented at NeurIPS 2019 later this year, and open-sourcing MiniRTSv2 so other researchers can use it to build and test their own imitation and reinforcement learning systems.
Previously,
the AI research community has found it challenging to bring this
hierarchical decision-making process to AI systems. Doing so has meant
that researchers had to manually specify how to break down a problem
into macro-actions, which is difficult to scale and requires expertise.
Alternatively, if the AI system has been trained to focus on the end
task, it is likely to learn how to achieve success through a single
composite action rather than a hierarchy of steps. Our work with
MiniRTSv2 shows that a different natural language-based method can make
progress against these challenges.
While
this is foundational research, it suggests that by using language to
represent plans, these systems can more efficiently generalize to a
variety of tasks and adapt to new circumstances. We believe this can
bring us closer to our long-term goal of building AI that can adapt and
generalize in real-world settings.
Building MiniRTSv2, an open source, NLP-ready game environment
MiniRTSv2
is a streamlined strategy game designed specifically for AI research.
In the game, a player commands archers, dragons, and other units in
order to defeat an opponent.
In
this sample MiniRTSv2 gameplay — recorded directly from the tool’s
interface — all the instructions that appear below the map field are
generated by an instructor model, while the corresponding in-game
actions, such as building and attacking units, are carried out by a
separate executor model.
Though
MiniRTSv2 is intentionally simpler and easier to learn than commercial
games such as DOTA 2 and StarCraft, it still allows for complex
strategies that must account for large state and action spaces,
imperfect information (areas of the map are hidden when friendly units
aren’t nearby), and the need to adapt strategies to the opponent’s
actions. Used as a training tool for AI, the game can help agents learn
effective planning skills, whether through NLP-based techniques or other
kinds of training, such as reinforcement and imitation learning.
Using language to generate high-level plans and assign low-level instructions
We
used MiniRTSv2 to train AI agents to first express a high-level
strategic plan as natural language instructions and then to act on that
plan with the appropriate sequence of low-level actions in the game
environment. This approach leverages natural language’s built-in
benefits for learning to generalize to new tasks. Those include the
expressive nature of language — different combinations of words can
represent virtually any concept or action — as well as its compositional
structure, which allows people to combine and rearrange words to create
new sentences that others can then understand. We applied these
features to the entire process of planning and execution, from the
generation of strategy and instructions to the interface that bridges
the different parts of the system’s hierarchical structure.
Our
AI agent plays a real-time strategy game using two models. The
instructor creates plans based on continually observing the game state
and issues instructions in natural language to the executor. The
executor grounds these instructions as actions, based on the current
state of the game.
The
AI agent that we built to test this approach consists of a two-level
hierarchy — an instructor model that decides on a course of action and
issues commands, and an executor model that carries out those
instructions. We trained both models using a data set collected from
human participants playing MiniRTSv2.
Those
participants worked in instructor-executor pairs, with designated
instructors issuing orders in the form of written text, and executors
accessing the game’s controls to carry those orders out. The commands
ranged from clear-cut directives, such as “build 1 dragon,” to general
instructions, such as “attack.” We used these natural language
interactions between players to generate a data set of 76,000 pairs of
instructions and executions across 5,392 games
Leveraging the versatility of natural language to learn more generalized plans
Though
MiniRTSv2 isn’t designed solely for NLP-related work, the game
environment’s text interface allows us to explore ambiguous and
context-dependent linguistic features that are relevant to building more
versatile AI. For example, given the instruction “make two more cavalry
and send them over with the other ones,” the executor model has to
grasp that “the other ones” are existing cavalry, an inference that’s
simple for most humans, but potentially challenging for AI. The agent
also has to account for the kind of potentially confusing nuances that
are common in natural language. The specific command “send idle peasant
to mine mineral” should lead to the same action as the comparatively
vague “back to mine,” which doesn’t specify which units should be moved.
At
each time step within a given MiniRTSv2 game, our system relies on
three encoders to turn inputs into feature vectors that the model can
use. The observation encoder focuses on spatial inputs (where game
objects appear on the map) and nonspatial inputs (such as the type of
unit or building that a given game object represents); the instruction
encoder generates vectors from a recent list of natural language
instructions; and the auxiliary encoder learns vectors for the remaining
global game attributes (such as the total amount of resources a player
has).
But
rather than clarifying phrasing or eliminating redundant permutations
of the same order, we intentionally leave the human instruction examples
(and corresponding executor actions) as they were delivered. The
instructor model can’t formulate original sentences and has to select
from examples from human play-throughs. This forces the agent to develop
pragmatic inference, learning how to plan and execute based on natural
language as humans actually use it, even when that usage is imprecise.
Training
our system to not only generate latent language commands but also
understand the context of those instructions resulted in a significant
boost in performance over more traditional agents. Using MiniRTSv2, we
pitted a number of different agents against an AI opponent that was
trained to directly imitate human actions, without taking language into
account. The results from these experiments showed that language
consistently improved agents’ win rates. For example, our most
sophisticated NLP-based agent, which uses a recurrent neural network
(RNN) encoder to help differentiate similar orders, beat the
non-language-based AI opponent 57.9 percent of the time. That’s
substantially better than the imitation-based agent’s 41.2 percent win
rate.
This
is the first model to show improvements in planning by generating and
executing latent natural language instructions. And though we employed a
video game to evaluate our agents, the implications of this work go far
beyond boosting the skills of game-playing AI bots, suggesting the
long-term potential of employing language to improve generalization. Our
evaluations showed that performance gains for NLP-based agents
increased with larger instruction sets, as the models were able to use
the compositional structure within language to better generalize across a
wide range of examples.
And
in addition to improving generalization, this approach has the
significant side benefit of demonstrating how decision-making AI systems
can be simultaneously high performance, versatile, and more
interpretable. If an agent’s planning process is based on natural
language, with sentences mapped directly to actions, understanding how a
system arrived at a given action could be as simple as reading its
internal transcript. The ability to quickly vet an AI’s behavior could
be particularly useful for AI assistants, potentially allowing a user to
fine-tune the system’s future actions.
Building language-based AI assistants through open science and collaboration
While
our results have focused on using language as an aid for hierarchical
decision-making, improving the ability of AI systems to utilize and
understand natural language could pave the way for an even wider range
of potential long-term benefits, such as assistants that are better at
adapting to unfamiliar tasks and surroundings. Progress in this area
might also yield systems that respond better to spoken or written
commands, making devices and platforms more accessible to people who
aren’t able to operate a touchscreen or mouse.
As
promising as our results have been, the experimental task that we’re
presenting, the NLP-based data set that we’ve created, and the MiniRTSv2
environment that we’ve updated are all novel contributions to the
field. Exploring their full potential will require a substantial
collective effort, which is why we’re inviting the wider AI community to use them.
And these resources aren’t limited to one task — for example, since the
MiniRTSv2 interface makes it easy to isolate the language activity from
the recorded games, our data set of sample commands could be valuable
for researchers training NLP systems, even if their work is unrelated to
game performance or hierarchical decision-making. We look forward to
seeing the results and insights that other researchers generate using
these tools, as we continue to advance the application of language to
improve the quality, versatility, and transparency of AI
decision-making.
No comments:
Post a Comment