Zhaxizhuoma, Pengan Chen, Ziniu Wu, Jiawei Sun, Dong Wang, Peng Zhou, Nieqing Cao, Yan Ding, Bin Zhao, Xuelong Li
{"title":"AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots","authors":"Zhaxizhuoma, Pengan Chen, Ziniu Wu, Jiawei Sun, Dong Wang, Peng Zhou, Nieqing Cao, Yan Ding, Bin Zhao, Xuelong Li","doi":"arxiv-2409.11905","DOIUrl":null,"url":null,"abstract":"This paper presents AlignBot, a novel framework designed to optimize\nVLM-powered customized task planning for household robots by effectively\naligning with user reminders. In domestic settings, aligning task planning with\nuser reminders poses significant challenges due to the limited quantity,\ndiversity, and multimodal nature of the reminders. To address these challenges,\nAlignBot employs a fine-tuned LLaVA-7B model, functioning as an adapter for\nGPT-4o. This adapter model internalizes diverse forms of user reminders-such as\npersonalized preferences, corrective guidance, and contextual assistance-into\nstructured instruction-formatted cues that prompt GPT-4o in generating\ncustomized task plans. Additionally, AlignBot integrates a dynamic retrieval\nmechanism that selects task-relevant historical successes as prompts for\nGPT-4o, further enhancing task planning accuracy. To validate the effectiveness\nof AlignBot, experiments are conducted in real-world household environments,\nwhich are constructed within the laboratory to replicate typical household\nsettings. A multimodal dataset with over 1,500 entries derived from volunteer\nreminders is used for training and evaluation. The results demonstrate that\nAlignBot significantly improves customized task planning, outperforming\nexisting LLM- and VLM-powered planners by interpreting and aligning with user\nreminders, achieving 86.8% success rate compared to the vanilla GPT-4o baseline\nat 21.6%, reflecting a 65% improvement and over four times greater\neffectiveness. Supplementary materials are available at:\nhttps://yding25.com/AlignBot/","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11905","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper presents AlignBot, a novel framework designed to optimize
VLM-powered customized task planning for household robots by effectively
aligning with user reminders. In domestic settings, aligning task planning with
user reminders poses significant challenges due to the limited quantity,
diversity, and multimodal nature of the reminders. To address these challenges,
AlignBot employs a fine-tuned LLaVA-7B model, functioning as an adapter for
GPT-4o. This adapter model internalizes diverse forms of user reminders-such as
personalized preferences, corrective guidance, and contextual assistance-into
structured instruction-formatted cues that prompt GPT-4o in generating
customized task plans. Additionally, AlignBot integrates a dynamic retrieval
mechanism that selects task-relevant historical successes as prompts for
GPT-4o, further enhancing task planning accuracy. To validate the effectiveness
of AlignBot, experiments are conducted in real-world household environments,
which are constructed within the laboratory to replicate typical household
settings. A multimodal dataset with over 1,500 entries derived from volunteer
reminders is used for training and evaluation. The results demonstrate that
AlignBot significantly improves customized task planning, outperforming
existing LLM- and VLM-powered planners by interpreting and aligning with user
reminders, achieving 86.8% success rate compared to the vanilla GPT-4o baseline
at 21.6%, reflecting a 65% improvement and over four times greater
effectiveness. Supplementary materials are available at:
https://yding25.com/AlignBot/