{"title":"VertiEncoder: Self-Supervised Kinodynamic Representation Learning on Vertically Challenging Terrain","authors":"Mohammad Nazeri, Aniket Datar, Anuj Pokhrel, Chenhui Pan, Garrett Warnell, Xuesu Xiao","doi":"arxiv-2409.11570","DOIUrl":null,"url":null,"abstract":"We present VertiEncoder, a self-supervised representation learning approach\nfor robot mobility on vertically challenging terrain. Using the same\npre-training process, VertiEncoder can handle four different downstream tasks,\nincluding forward kinodynamics learning, inverse kinodynamics learning,\nbehavior cloning, and patch reconstruction with a single representation.\nVertiEncoder uses a TransformerEncoder to learn the local context of its\nsurroundings by random masking and next patch reconstruction. We show that\nVertiEncoder achieves better performance across all four different tasks\ncompared to specialized End-to-End models with 77% fewer parameters. We also\nshow VertiEncoder's comparable performance against state-of-the-art kinodynamic\nmodeling and planning approaches in real-world robot deployment. These results\nunderscore the efficacy of VertiEncoder in mitigating overfitting and fostering\nmore robust generalization across diverse environmental contexts and downstream\nvehicle kinodynamic tasks.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11570","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We present VertiEncoder, a self-supervised representation learning approach
for robot mobility on vertically challenging terrain. Using the same
pre-training process, VertiEncoder can handle four different downstream tasks,
including forward kinodynamics learning, inverse kinodynamics learning,
behavior cloning, and patch reconstruction with a single representation.
VertiEncoder uses a TransformerEncoder to learn the local context of its
surroundings by random masking and next patch reconstruction. We show that
VertiEncoder achieves better performance across all four different tasks
compared to specialized End-to-End models with 77% fewer parameters. We also
show VertiEncoder's comparable performance against state-of-the-art kinodynamic
modeling and planning approaches in real-world robot deployment. These results
underscore the efficacy of VertiEncoder in mitigating overfitting and fostering
more robust generalization across diverse environmental contexts and downstream
vehicle kinodynamic tasks.