Why is picking up things so hard for robots and non-humans? originally appeared on Quora: the place to gain and share knowledge, empowering people to learn from others and better understand the world.
I work on robotic grasping and manipulation research using deep learning-based vision approaches.
On the hardware side: human hands are prehensile, flexible, and can adapt to a lot of different configurations/wrap around various shapes.
This makes it possible for hands to “envelope” an object or functional component of an object so that it can be lifted.
By contrast, many animals lack the flexibility of the hand or an opposable thumb. Imagine you are a puppy dog attempting to pick up a frisbee using your paws.
On the software side: even with a good manipulator (hand), picking things up with the speed and accuracy of humans requires a general, non-specific understanding of how objects behave and should be handled to ensure stable grasps. For example, I could pick up a cup by grasping its rim, but a far more stable and energy efficient grasp would be to simply grasp by the handle.
As another example, consider a computer mouse with a loose cable. If the perception system assumes that all objects are rigid, then it would probably grasp the mouse incorrectly, since the cable would fall to the ground without some sequential manipulation (bunching up the cable to pick up the mouse in one go, or moving the mouse followed by the cable).
These kinds of “common sense knowledge” are difficult to imbue in a robotic vision system, or even in non-human animals, since they are inexperienced in the world of graspable objects that humans interact with on a daily basis.
Unfamiliarity also makes motor control more difficult. We and animals are not able to execute motions that we don’t practice often. In the case of learning robots, “practice” reinforces actions in a different way but the analogy roughly holds. In fact, many people deal with the difficulty of “picking things up” when it comes to picking things up with chopsticks.
Prior to end-to-end learning-based approaches being deployed for robotics (i.e. use deep learning to learn everything from scratch using a lot of data + trial and error), the predominant dogma for grasping and robotics work was to use specialized hardware/SLAM to acquire 3D geometry approximations, and solve 3D kinematics or dynamics models (similar to computer graphics) to plan optimal grasps.
This question originally appeared on Quora - the place to gain and share knowledge, empowering people to learn from others and better understand the world. You can follow Quora on Twitter, Facebook, and Google+. More questions: