Robots Learn Real-Time Decision-Making

California Institute of Technology

In 2018, Google DeepMind's AlphaZero program taught itself the games of chess, shogi, and Go using machine learning and a special algorithm to determine the best moves to win a game within a defined grid. Now, a team of Caltech researchers has developed an analogous algorithm for autonomous robots—a planning and decision-making control system that helps freely moving robots determine the best movements to make as they navigate the real world.

"Our algorithm actually strategizes and then explores all the possible and important motions and chooses the best one through dynamic simulation, like playing many simulated games involving moving robots," says Soon-Jo Chung , Caltech's Bren Professor of Control and Dynamical Systems and a senior research scientist at JPL, which Caltech manages for NASA. "The breakthrough innovation here is that we have derived a very efficient way of finding that optimal safe motion that typical optimization-based methods would never find."

The team describes the technique, which they call Spectral Expansion Tree Search (SETS), in the December cover article of the journal Science Robotics.

Many robots can move quite freely and in any direction. Consider, for example, a humanoid robot designed to assist an elderly person in a home. Such a robot should be able to move in many different ways and, essentially, in any direction within the space as it encounters obstacles or unexpected events while completing its tasks. That robot's set of movements, obstacles, and challenges will be very different from those of a self-driving car, for example.

How, then, can a single algorithm guide different robotic systems to make the best decisions to move through their surroundings?

"You don't want a designer to have to go in and handcraft these motions and say, 'This is the discrete set of moves the robot should be able to do,'" says John Lathrop, a graduate student in control and dynamical systems at Caltech and co-lead author of the new paper. "To overcome this, we came up with SETS."

SETS uses control theory and linear algebra to find natural motions that use a robotic platform's capabilities to its fullest extent in a physical setting.

The basic underlying concept is based on a Monte Carlo Tree Search, a decision-making algorithm also used by Google's AlphaZero. Here, Monte Carlo essentially means something random, and tree search refers to navigating a branching structure that represents the relationships of data in a system. In such a tree, a root branches off to so-called child nodes that are connected by edges. Using Monte Carlo Tree Search for a game like Go, possible moves are represented as new nodes, and the tree grows larger as more random samples of possible trajectories are attempted. The algorithm plays out the possible moves to see the final outcomes of the different nodes and then selects the one that offers the best outcome based on a point valuation.

The problem, Lathrop explains, is that when using this branching tree structure for continuous dynamical systems such as robots operating in the physical world, the total number of trajectories in the tree grows exponentially. "For some problems, trying to simulate every single possibility and then figure out which one is best would take years, maybe hundreds of years," he says.

To overcome this, SETS takes advantage of an exploration/exploitation trade-off. "We want to try simulating trajectories that we haven't investigated before—that's exploration," Lathrop says. "And we want to continue looking down paths that have previously yielded high reward—that's exploitation. By balancing the exploration and the exploitation, the algorithm is able to quickly converge on the optimal solution among all possible trajectories."

For example, if a robot starts out calculating a couple of possible actions that it determines would cause it to smash into a wall, there is no need for it to investigate any of the other nodes on that branch of the tree.

"This exploration/exploitation tradeoff and search over the robot's natural motions enables our robots to think, move, and adapt to new information in real-time," says Benjamin Rivière (PhD '24), a postdoctoral scholar research associate in mechanical and civil engineering at Caltech and co-lead author of the paper.

SETS can run an entire tree search in about a tenth of a second. During that time, it can simulate thousands to tens of thousands of possible trajectories, select the best one, and then act. The loop continues over and over, giving the robotic system the ability to make many decisions each second.

A key feature of the SETS algorithm is that it can be applied to essentially any robotic platform. The features and capabilities do not have to be programmed individually. In the new paper, Chung and his colleagues demonstrate the algorithm's successful utility in three completely different experimental settings—something that is very rare in robotics papers.

In the first, a quadrotor drone was able to observe four hovering white balls while avoiding four orange balls, all while navigating an airfield rife with randomly occurring, dangerous air currents, or thermals. The drone experiment was conducted at Caltech's Center for Autonomous Systems and Technologies (CAST). In the second, the algorithm augmented a human driver of a tracked ground vehicle to navigate a narrow and winding track without hitting the siderails. And in the final setup, SETS helped a pair of tethered spacecraft capture and redirect a third agent, which could represent another spacecraft, an asteroid or another object.

A team of Caltech students and researchers are currently applying a version of the SETS algorithm to an Indy car that will participate in the Indy Autonomous Challenge at the Consumer Electronics Show (CES) in Las Vegas on January 9.

The work was supported by the Defense Advanced Research Projects Agency's Learning Introspective Control (LINC) program, the Aerospace Corporation, and Supernal, and is in part based on work supported by the National Science Foundation Graduate Research Fellowship Program.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.