The modern world surrounds us with the products of artificial intelligence (AI) research. These AI systems mainly work behind the scenes, in information space; they’re the filtering algorithms on a social media platform or the recommender algorithms on a retail Web site, for example. A new generation of intelligent systems that operate in information and physical space is emerging —mobile robots.
Autonomous vehicles are probably the best-known examples of intelligent mobile robots. Although the definition of “robot” is flexible, in general, a robot can be thought of as a computer-controlled machine that senses and interacts with the physical environment—legged robots with cameras and manipulators but also cars that decide where to drive. Related systems with intelligence include robots that carry out tasks in the home, such as vacuuming the floors, cutting the lawn, or taking out the trash. In more specialized environments, we find delivery robots that move through warehouses or offices, carrying products or documents to people who need them, and urban search and rescue robots that assist human rescue workers to find survivors in collapsed buildings.
This article gives the nontechnical reader interested in military operations an overview of three areas central to AI research, in general, and robotics, in particular: perception, knowledge representation and reasoning, and decision-making for action.
- Perception – How can a robot make sense of its environment, a multiscale, dynamically changing world that includes friendly forces and enemies?
- Knowledge representation and reasoning – How can a robot turn the huge data stream of its perceptions into a persistent representation that supports questions and answers, relevance judgments, and progress toward military goals?
- Decision-making for action – How can a robot decide to act?
Before we describe these three areas, we must first consider the technical and strategic perspectives when building and using intelligent mobile robots.
TECHNICAL AND STRATEGIC PERSPECTIVES
Broadly speaking, we can judge intelligent robot behavior in each domain from two perspectives—technical and strategic. From a technical perspective, we can apply relatively well-understood engineering principles to the interested behavior to guide us in constructing a robot, judge how difficult that construction will be, or predict how well our robot will perform. From a strategic perspective, we must make judgment calls about prospects and tradeoffs.
Some technical considerations generalize beyond AI and robotics. For example, movement over an unfamiliar, unstable surface might be facilitated by extensive physics simulations, but a mobile robot does not have the computational resources or the time to carry them out. Some problems can be categorized as probably intractable. Other considerations, however, are specific to AI and robotics.
Let’s take a simple example. For centuries, chess-playing skills have been associated with human intelligence; this naturally made the game prominent in early AI research. Imagine a robot designed to play chess—not ordinary chess on a tabletop, but outdoors, with large pieces set up on a lawn (see Figure 1). What makes this task easy or hard? A snapshot of the board, visible to players and spectators, conveys all relevant information about the state of play, which facilitates problem solving. For the robot, however, this depends on its ability to recognize pieces and locations, and even to perceive the board in its totality. The robot can pause for minutes or hours, in principle, and the game remains in stasis the entire time. Whenever the robot does act, it chooses from a fixed set of legal moves. Executing those moves, however, involves planning paths, lifting and carrying pieces, and not disturbing the other pieces on the board.
As a purely abstract game, chess is appealing, in part, for its simplicity—the board does not change with the passage of time; players choose from a small fixed set of actions; and the result of an action is completely predictable. But a physical game of chess changes this picture. The environment may not be completely observable; the robot may drop pieces on the wrong square or produce other unintended results; and unruly spectators can disturb the board or interfere with the robot’s actions. Here, we see the textbook features of a task environment , which include (in easy/hard terms) full/partial observability, discrete/ continuous spaces of states and actions, deterministic/stochastic outcomes, and a static/dynamic environment. Robots, especially mobile robots, are often required to operate in task environments that pose serious challenges regarding these features.
Consider this task for a robot vehicle (see Figure 2) : The commander’s orders are to deliver supplies from division rear up to the Armored Brigade Combat Team. Some of the long-haul trucks are outfitted to move as unmanned ground vehicles (UGVs). A convoy is formed of a chain of segments, each segment consisting of a vehicle with a human driver and a small crew at the head, followed by four or five UGVs. The crews have been briefed on safety as well as rules of engagement. The vehicles will maintain a distance of 25 meters between crews and watch for civilian vehicles or pedestrians that cut into the convoy. There is an alert that insurgents are known to be operating along the convoy’s route. The insurgents, armed with small arms weapons, explosives, and possible vehicle-borne improvised explosive devices, could pose a threat to the convoy. Because of the possibility of attack, the convoy should stop only in case of emergency… During the movement through one of the small towns along the route, a young man steps off the sidewalk, walking between two of the UGVs. It could be a ploy to get the convoy to stop. It could be a teenager crossing the street.
From our technical perspective, some of the same issues carry over from the robot chess example to the behavior of a hypothetical intelligent UGV. A UGV has a limited range of sensors for gathering information (implying partial observability); that information (continuous video, sound, scan data, etc.) requires interpretation to be meaningful. The UGV has limited time for its processing; aside from the dynamics of driving, detecting, identifying, and evaluating, the pedestrian must happen within seconds. The UGV, even with its limited repertoire, has a difficult choice between actions—is braking enough to avoid hitting the pedestrian or is swerving required as well? Is there a risk of crashing into a parked car or a permanent structure, with resulting damage and a halt of the convoy? How should these risks be evaluated?
Some questions cannot be answered from the technical perspective we have presented. Strategic issues must also be addressed. We will frame our discussion around what is known vs. unknown.
Often, we grapple with the unknown when dealing with a domain itself. That is, we want robots for tasks and environments that are problematic for human beings; these cases tend to involve unknowns. We value intelligence in a mobile robot because it can adapt to and deal with unknowns (e.g., driving on unfamiliar roads or off-road, searching a half-collapsed building, or finding a concealed perch for aerial surveillance). In the convoy scenario, unknowns might extend to the plausible actions and reactions of the people in the areas the UGV is driving through. Developing systems for incompletely understood tasks and environments is difficult.
Even if we do have a reasonable, informal understanding of a domain, theory is often lacking. For example, the rules of engagement have legal and ethical foundations. However, legal reasoning is still a challenge in AI, despite decades of research, and computational ethics is in its infancy. Sometimes, we do not have a clear path toward an intelligent system, one built on well-understood principles that give us reason to believe the system will be competent.
We may face another challenge in establishing good performance measures. In the convoy scenario, the timeliness of the delivery and whether the crews arrived safely are obvious measures, but other qualitative factors are more difficult to evaluate. For example, were the crews at special risk at any point in time? Was every UGV action consistent with the rules of engagement? We may be able to formalize such questions; after all, we can evaluate human performance along these lines. But this points out a new concern—the UGV is an autonomous system, and it may be hard to determine the exact contribution of its decisions to overall performance. This is characteristic when evaluating robots. We may be tempted to judge them in the same way as other machines; however, autonomy requires deeper analysis.
Finally, sometimes we can build a system for a domain that we do understand and can reasonably evaluate its performance. Even in these situations, we often don’t completely understand how the system works. We might have tested our robot in the laboratory, under tightly controlled conditions; we have run it through endless simulations; we have even put it through live exercises. And yet, uncertainty remains about whether the system’s performance will degrade gracefully when put to the test in the most demanding environments.
Legal reasoning is still a challenge in AI, despite decades of research, and computational ethics is in its infancy.
When dealing with machine learning, it is reasonable to ask, “Isn’t it possible for a robot to learn autonomously about the domain, even theory about solving problems, and about its own performance—perhaps even to explain itself?” Machine learning, including deep learning, is not a panacea, despite many recent success stories . One limitation is relevant as a strategic issue—sometimes we cannot effectively train a system or tune its performance because the data are too sparse. This may have to do with the accessibility of a domain (as with outer space or undersea navigation) or with the cost or risk of data collection (as with real military operations).
To summarize, it is important to judge what is known about the application domain, the theoretical underpinnings for effective problem solving, performance evaluation, and the causal factors driving a system’s behavior in practice. In cases where risks and benefits can be quantified, we may be able to treat the development and deployment of an intelligent mobile robot in technical and economic terms— we can ask about its expected utility. This is not always possible, however. Instead, we must make qualitative judgments about the risks of proceeding with limited knowledge. As a final point, it is useful to know that the AI and robotics literature contains decades of research on problems that are persistent and difficult. We will see examples of hard problems later in this article.
AI AND ROBOTICS RESEARCH AREAS
We will now focus on the three areas of ongoing research within the Autonomous Systems Division’s (ASD’s) Intelligent Control group in the Vehicle Technology Directorate of the U.S. Army Research Laboratory. Our descriptions will not be detailed or complete; however, we intend to give a representative picture.
To unify our discussion, we will use the convoy scenario previously outlined. Consider a human planner who is mentally running through what might possibly happen as the scenario plays out. What should be attended to, how should it be evaluated, and what are the options for action? The planner is, in some ways, behaving like a detective, but not analyzing existing clues. Instead, he or she is imagining and evaluating situations that might occur toward the end of achieving the mission goals and ensuring that Soldiers are safe.
Dana Ballard and Christopher Brown’s early account of machine perception  begins at a low level, with the extraction of features from sensor data (e.g., image processing), recognition of patterns over those features, and eventually object recognition. Perception goes well beyond object recognition, as might be expected. Much of an agent’s intelligence derives from his or her perception of the task environment. This means that perceptual processing is “effectively inseparable” from high-level cognitive faculties, including memory, reasoning, and learning. Perception research in the ASD group takes a comparable, general view. The goal is to develop a framework that integrates perception, cognition, and knowledge so that adaptive learning from experience becomes possible.
In our convoy scenario, consider what a UGV might be expected to sense and flag when suggesting possible danger, acting as a proxy for a Soldier. Fewer people than usual might be moving along a given street, or perhaps the typical balance between men and women is different. One of the usually-open markets is shut down. A young man appears at the opening to a side street or is seen running toward an intersection ahead of the UGV. In these examples, basic perceptual tasks integrate and interact with higher-level processes.
Robots cannot yet manage perception and interpretation at this level. In general, we can expect near-term progress to be made in some supporting areas, however. Sensor hardware will expand and grow more refined. For example, over the past decade or so, robots have increasingly included RGB-D data (see Figure 3) that provide color and depth information. Perception for navigation-specific tasks, including localization and mapping, will improve. Object recognition will be possible over a broader range. Machine judgments of salience—what is important in a scene, such as the detection of the movement of the young man previously mentioned— will become more accurate.
However, some perception challenges are likely to remain for the long term. Salience, for example, depends on more than visual patterns; it requires evaluating context and applying knowledge. Not all movement is important, such as people walking along a sidewalk or a child chasing after a ball, even if the visual patterns are similar. Context comes into play in evaluating the clues mentioned. Our human planner automatically realizes that people’s activities vary depending on the time of day and the day of the week; a special event such as a festival or funeral might change their behavior. Context and applying background knowledge can change the evaluation of what is perceived. Unfortunately, these are notoriously difficult challenges in AI, especially for interpreting scenes that include human beings and human artifacts. What are people doing now or in the recent past, and what does that suggest about their beliefs, plans, and future actions?
Knowledge Representation and Reasoning
The question just posed clearly does not only belong to perception; it involves knowledge and reasoning. In robotics, knowledge representation typically focuses on a world model, an internal representation of the external environment. A world model can be as simple as a database of facts. But often, it is useful for a robot to reason about what it perceives, drawing on knowledge both specific to its situation and in general. Work in the ASD group has been with a description logic , which supports representation and reasoning about objects and categories. The goal is a world model that can capture spatial, temporal, and semantic information relevant to air and ground systems, plus tools for analyzing, updating, and sharing between intelligent systems.
In general, the more possible it is to express in a representation, the longer it can take to reason ; this is a tradeoff between expressiveness and efficiency (or tractability). A description logic strikes a reasonable balance in the tradeoff.
Representation and inference algorithms have been core areas of AI research since the inception of the field, and gradual progress has been made on both expressiveness and efficiency. More specifically, we expect improvements in automated techniques to help integrate separately developed knowledge bases; refine knowledge representations initially constructed by hand; and capture knowledge from interaction with the environment.
We can expect gradual progress in the ability of robots to explain their decisions and actions.
Hard problems remain. Maintaining the validity of the world model in a dynamic environment over time is closely tied to perception; perceptual change is constant in a mobile system. Failures will be inevitable in determining what is true and which action to take, within time constraints; how to deal with such failures must be dealt with by mechanisms outside of the world model.
Another challenge is commonsense reasoning (which can inform context). Imagine the instruction, “Look for activity in front of the tall building on the corner of the intersection.” A building described as “tall” might be 3 stories or 100 stories, depending on its surroundings. The “front” of the building may depend on the structure of the building, such as an entrance and clear walkways, but also on people’s activities. Human beings carry out such inferences effortlessly, but they require enormous amounts of stored knowledge or computation for a robot to match.
As another example from the convoy scenario, our human planner might imagine debris left along one stretch of the roadway, where usually it is clear. Building materials might be stacked at a point where they could be quickly turned into a barrier. These might suggest an ambush, but other observations might suggest that a building is being constructed nearby. AI systems can generate and evaluate alternative explanations for a given set of observations, but creativity and intuition in the process are not well understood.
Decision-Making for Action
Reasoning, as in the previous section, may reach conclusions about actions, but deciding to act does not always take the form of logical inference based on knowledge. For example, behavior-based robots, inspired by biological organisms, may do little reasoning at all. Their complex behaviors are layered incrementally on top of simpler behaviors. A range of other possibilities exists. Robot decisions can be formalized as Markov Decision Processes, can be the output of AI planning and scheduling algorithms, and be produced by cognitive architectures.
ASD research follows this last avenue. Cognitive processes take information from perception modules and the world model (a proxy for memory) to interpret scenes, objects, and activities in a cognitive and mission context. Interpretation and decision-making, as performed by the architecture, are shaped by what is known about human cognition. Eventually, models of learning, categorization, and analogy will be included. In other words, the approach is cognitive robotics according to the Technical Committee for Cognitive Robots : “Cognitive robots achieve their goals by… paying attention to the events that matter, planning what to do, anticipating the outcome of their actions and the actions of other agents, and learning from the resultant interaction.”
Incorporating cognitive factors into decision-making can potentially bring benefits by taking advantage of what is known about human cognitive processing. The decision process may exploit similar patterns in human cognition related to relevance and familiarity, and the results may be more easily understood by human beings. In the short term, whether cognitive robotics or more traditional AI approaches prevail, we can expect gradual progress in the ability of robots to explain their decisions and actions (e.g., in terms of justifying actions based on internal inferences and a set of percepts); to tailor their decision processes to resource constraints, such as time or computational bounds; and to adhere to constraints imposed by military doctrine.
We should not expect complete solutions in these areas, however, and hard problems will persist in other areas. For example, in some human decision-making, we see elements of intuition, resourcefulness, and even creativity in developing solutions to problems. Only very occasionally is an AI system described in similar terms—this mainly happens in highly structured games and comes as a surprise. Other long-term challenges include generalizing or transferring solutions from one domain to another, determining robustness of solutions across variations in problems, and expanding decision-making to include less understood factors, such as ethics and social norms.
We have presented a set of concepts by which problems in AI and robotics can be evaluated—whether and how a robot system can be expected to reasonably deal with problems in its environment. Aspects of the task environment indicate which problems are likely to be harder than others. The harder problems are adversarial, partially observable, stochastic, dynamic, and continuous . A strategic perspective is also needed to evaluate what is known and unknown in the application domain, underlying theory, performance evaluation, and system design. For a specific area within AI and robotics, we see complexities in evaluating salience or relevance, applying context and background knowledge, commonsense reasoning, generating and evaluating explanations of human behavior, and decision-making dimensions outside of logic and utility (e.g., ethics and creativity). AI and robotic systems will continue to improve over time, as well as our ability to understand and predict their performance.