When robots come across unfamiliar objects, they struggle to account for a simple truth: Appearances aren't everything. They may attempt to grasp a block, only to find out it's a literal piece of cake. The misleading appearance of that object could lead the robot to miscalculate physical properties like the object's weight and center of mass, using the wrong grasp and applying more force than needed.
To see through this illusion, MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers designed the Grasping Neural Process, a predictive physics model capable of inferring these hidden traits in real time for more intelligent robotic grasping. Based on limited interaction data, their deep-learning system can assist robots in domains like warehouses and households at a fraction of the computational cost of previous algorithmic and statistical models.
The Grasping Neural Process is trained to infer invisible physical properties from a history of attempted grasps, and uses the inferred properties to guess which grasps would work well in the future. Prior models often only identified robot grasps from visual data alone.
Typically, methods that infer physical properties build on traditional statistical methods that require many known grasps and a great amount of computation time to work well. The Grasping Neural Process enables these machines to execute good grasps more efficiently by using far less interaction data and finishes its computation in less than a tenth of a second, as opposed seconds (or minutes) required by traditional methods.
The researchers note that the Grasping Neural Process thrives in unstructured environments like homes and warehouses, since both house a plethora of unpredictable objects. For example, a robot powered by the MIT model could quickly learn how to handle tightly packed boxes with different food quantities without seeing the inside of the box, and then place them where needed. At a fulfillment center, objects with different physical properties and geometries would be placed in the corresponding box to be shipped out to customers.
Trained on 1,000 unique geometries and 5,000 objects, the Grasping Neural Process achieved stable grasps in simulation for novel 3D objects generated in the ShapeNet repository. Then, the CSAIL-led group tested their model in the physical world via two weighted blocks, where their work outperformed a baseline that only considered object geometries. Limited to 10 experimental grasps beforehand, the robotic arm successfully picked up the boxes on 18 and 19 out of 20 attempts apiece, while the machine only yielded eight and 15 stable grasps when unprepared.
While less theatrical than an actor, robots that complete inference tasks also have a three-part act to follow: training, adaptation, and testing. During the training step, robots practice on a fixed set of objects and learn how to infer physical properties from a history of successful (or unsuccessful) grasps. The new CSAIL model amortizes the inference of the objects' physics, meaning it trains a neural network to learn to predict the output of an otherwise expensive statistical algorithm. Only a single pass through a neural network with limited interaction data is needed to simulate and predict which grasps work best on different objects.
Then, the robot is introduced to an unfamiliar object during the adaptation phase. During this step, the Grasping Neural Process helps a robot experiment and update its position accordingly, understanding which grips would work best. This tinkering phase prepares the machine for the final step: testing, where the robot formally executes a task on an item with a new understanding of its properties.
"As an engineer, it's unwise to assume a robot knows all the necessary information it needs to grasp successfully," says lead author Michael Noseworthy, an MIT PhD student in electrical engineering and computer science (EECS) and CSAIL affiliate. "Without humans labeling the properties of an object, robots have traditionally needed to use a costly inference process." According to fellow lead author, EECS PhD student, and CSAIL affiliate Seiji Shaw, their Grasping Neural Process could be a streamlined alternative: "Our model helps robots do this much more efficiently, enabling the robot to imagine which grasps will inform the best result."
"To get robots out of controlled spaces like the lab or warehouse and into the real world, they must be better at dealing with the unknown and less likely to fail at the slightest variation from their programming. This work is a critical step toward realizing the full transformative potential of robotics," says Chad Kessens, an autonomous robotics researcher at the U.S. Army's DEVCOM Army Research Laboratory, which sponsored the work.
While their model can help a robot infer hidden static properties efficiently, the researchers would like to augment the system to adjust grasps in real time for multiple tasks and objects with dynamic traits. They envision their work eventually assisting with several tasks in a long-horizon plan, like picking up a carrot and chopping it. Moreover, their model could adapt to changes in mass distributions in less static objects, like when you fill up an empty bottle.
Joining the researchers on the paper is Nicholas Roy, MIT professor of aeronautics and astronautics and CSAIL member, who is a senior author. The group recently presented this work at the IEEE International Conference on Robotics and Automation.