Himanshu Gaurav Singh, Antonio Loquercio, Carmelo Sferrazza, Jane Wu, Haozhi Qi, Pieter Abbeel, Jitendra Malik
{"title":"Hand-Object Interaction Pretraining from Videos","authors":"Himanshu Gaurav Singh, Antonio Loquercio, Carmelo Sferrazza, Jane Wu, Haozhi Qi, Pieter Abbeel, Jitendra Malik","doi":"arxiv-2409.08273","DOIUrl":null,"url":null,"abstract":"We present an approach to learn general robot manipulation priors from 3D\nhand-object interaction trajectories. We build a framework to use in-the-wild\nvideos to generate sensorimotor robot trajectories. We do so by lifting both\nthe human hand and the manipulated object in a shared 3D space and retargeting\nhuman motions to robot actions. Generative modeling on this data gives us a\ntask-agnostic base policy. This policy captures a general yet flexible\nmanipulation prior. We empirically demonstrate that finetuning this policy,\nwith both reinforcement learning (RL) and behavior cloning (BC), enables\nsample-efficient adaptation to downstream tasks and simultaneously improves\nrobustness and generalizability compared to prior approaches. Qualitative\nexperiments are available at: \\url{https://hgaurav2k.github.io/hop/}.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08273","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We present an approach to learn general robot manipulation priors from 3D
hand-object interaction trajectories. We build a framework to use in-the-wild
videos to generate sensorimotor robot trajectories. We do so by lifting both
the human hand and the manipulated object in a shared 3D space and retargeting
human motions to robot actions. Generative modeling on this data gives us a
task-agnostic base policy. This policy captures a general yet flexible
manipulation prior. We empirically demonstrate that finetuning this policy,
with both reinforcement learning (RL) and behavior cloning (BC), enables
sample-efficient adaptation to downstream tasks and simultaneously improves
robustness and generalizability compared to prior approaches. Qualitative
experiments are available at: \url{https://hgaurav2k.github.io/hop/}.