Expanding Robot Perception

Massachusetts Institute of Technology

Robots have come a long way since the Roomba. Today, drones are starting to deliver door to door, self-driving cars are navigating some roads, robo-dogs are aiding first responders, and still more bots are doing backflips and helping out on the factory floor. Still, Luca Carlone thinks the best is yet to come.

Carlone, who recently received tenure as an associate professor in MIT's Department of Aeronautics and Astronautics (AeroAstro), directs the SPARK Lab, where he and his students are bridging a key gap between humans and robots: perception. The group does theoretical and experimental research, all toward expanding a robot's awareness of its environment in ways that approach human perception. And perception, as Carlone often says, is more than detection.

While robots have grown by leaps and bounds in terms of their ability to detect and identify objects in their surroundings, they still have a lot to learn when it comes to making higher-level sense of their environment. As humans, we perceive objects with an intuitive sense of not just of their shapes and labels but also their physics - how they might be manipulated and moved - and how they relate to each other, their larger environment, and ourselves.

That kind of human-level perception is what Carlone and his group are hoping to impart to robots, in ways that enable them to safely and seamlessly interact with people in their homes, workplaces, and other unstructured environments.

Since joining the MIT faculty in 2017, Carlone has led his team in developing and applying perception and scene-understanding algorithms for various applications, including autonomous underground search-and-rescue vehicles, drones that can pick up and manipulate objects on the fly, and self-driving cars. They might also be useful for domestic robots that follow natural language commands and potentially even anticipate human's needs based on higher-level contextual clues.

"Perception is a big bottleneck toward getting robots to help us in the real world," Carlone says. "If we can add elements of cognition and reasoning to robot perception, I believe they can do a lot of good."

Expanding horizons

Carlone was born and raised near Salerno, Italy, close to the scenic Amalfi coast, where he was the youngest of three boys. His mother is a retired elementary school teacher who taught math, and his father is a retired history professor and publisher, who has always taken an analytical approach to his historical research. The brothers may have unconsciously adopted their parents' mindsets, as all three went on to be engineers - the older two pursued electronics and mechanical engineering, while Carlone landed on robotics, or mechatronics, as it was known at the time.

He didn't come around to the field, however, until late in his undergraduate studies. Carlone attended the Polytechnic University of Turin, where he focused initially on theoretical work, specifically on control theory - a field that applies mathematics to develop algorithms that automatically control the behavior of physical systems, such as power grids, planes, cars, and robots. Then, in his senior year, Carlone signed up for a course on robotics that explored advances in manipulation and how robots can be programmed to move and function.

"It was love at first sight. Using algorithms and math to develop the brain of a robot and make it move and interact with the environment is one of the most fulfilling experiences," Carlone says. "I immediately decided this is what I want to do in life."

He went on to a dual-degree program at the Polytechnic University of Turin and the Polytechnic University of Milan, where he received master's degrees in mechatronics and automation engineering, respectively. As part of this program, called the Alta Scuola Politecnica, Carlone also took courses in management, in which he and students from various academic backgrounds had to team up to conceptualize, build, and draw up a marketing pitch for a new product design. Carlone's team developed a touch-free table lamp designed to follow a user's hand-driven commands. The project pushed him to think about engineering from different perspectives.

"It was like having to speak different languages," he says. "It was an early exposure to the need to look beyond the engineering bubble and think about how to create technical work that can impact the real world."

The next generation

Carlone stayed in Turin to complete his PhD in mechatronics. During that time, he was given freedom to choose a thesis topic, which he went about, as he recalls, "a bit naively."

"I was exploring a topic that the community considered to be well-understood, and for which many researchers believed there was nothing more to say." Carlone says. "I underestimated how established the topic was, and thought I could still contribute something new to it, and I was lucky enough to just do that."

The topic in question was "simultaneous localization and mapping," or SLAM - the problem of generating and updating a map of a robot's environment while simultaneously keeping track of where the robot is within that environment. Carlone came up with a way to reframe the problem, such that algorithms could generate more precise maps without having to start with an initial guess, as most SLAM methods did at the time. His work helped to crack open a field where most roboticists thought one could not do better than the existing algorithms.

"SLAM is about figuring out the geometry of things and how a robot moves among those things," Carlone says. "Now I'm part of a community asking, what is the next generation of SLAM?"

In search of an answer, he accepted a postdoc position at Georgia Tech, where he dove into coding and computer vision - a field that, in retrospect, may have been inspired by a brush with blindness: As he was finishing up his PhD in Italy, he suffered a medical complication that severely affected his vision.

"For one year, I could have easily lost an eye," Carlone says. "That was something that got me thinking about the importance of vision, and artificial vision."

He was able to receive good medical care, and the condition resolved entirely, such that he could continue his work. At Georgia Tech, his advisor, Frank Dellaert , showed him ways to code in computer vision and formulate elegant mathematical representations of complex, three-dimensional problems. His advisor was also one of the first to develop an open-source SLAM library, called GTSAM , which Carlone quickly recognized to be an invaluable resource. More broadly, he saw that making software available to all unlocked a huge potential for progress in robotics as a whole.

"Historically, progress in SLAM has been very slow, because people kept their codes proprietary, and each group had to essentially start from scratch," Carlone says. "Then open-source pipelines started popping up, and that was a game changer, which has largely driven the progress we have seen over the last 10 years."

Spatial AI

Following Georgia Tech, Carlone came to MIT in 2015 as a postdoc in the Laboratory for Information and Decision Systems (LIDS). During that time, he collaborated with Sertac Karaman, professor of aeronautics and astronautics, in developing software to help palm-sized drones navigate their surroundings using very little on-board power. A year later, he was promoted to research scientist, and then in 2017, Carlone accepted a faculty position in AeroAstro.

"One thing I fell in love with at MIT was that all decisions are driven by questions like: What are our values? What is our mission? It's never about low-level gains. The motivation is really about how to improve society," Carlone says. "As a mindset, that has been very refreshing."

Today, Carlone's group is developing ways to represent a robot's surroundings, beyond characterizing their geometric shape and semantics. He is utilizing deep learning and large language models to develop algorithms that enable robots to perceive their environment through a higher-level lens, so to speak. Over the last six years, his lab has released more than 60 open-source repositories , which are used by thousands of researchers and practitioners worldwide. The bulk of his work fits into a larger, emerging field known as "spatial AI."

"Spatial AI is like SLAM on steroids," Carlone says. "In a nutshell, it has to do with enabling robots to think and understand the world as humans do, in ways that can be useful."

It's a huge undertaking that could have wide-ranging impacts, in terms of enabling more intuitive, interactive robots to help out at home, in the workplace, on the roads, and in remote and potentially dangerous areas. Carlone says there will be plenty of work ahead, in order to come close to how humans perceive the world.

"I have 2-year-old twin daughters, and I see them manipulating objects, carrying 10 different toys at a time, navigating across cluttered rooms with ease, and quickly adapting to new environments. Robot perception cannot yet match what a toddler can do," Carlone says. "But we have new tools in the arsenal. And the future is bright."

/University Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.