MoRAG -- 针对人体运动的多融合检索增强生成技术

Kalakonda Sai Shashank, Shubh Maheshwari, Ravi Kiran Sarvadevabhatla
{"title":"MoRAG -- 针对人体运动的多融合检索增强生成技术","authors":"Kalakonda Sai Shashank, Shubh Maheshwari, Ravi Kiran Sarvadevabhatla","doi":"arxiv-2409.12140","DOIUrl":null,"url":null,"abstract":"We introduce MoRAG, a novel multi-part fusion based retrieval-augmented\ngeneration strategy for text-based human motion generation. The method enhances\nmotion diffusion models by leveraging additional knowledge obtained through an\nimproved motion retrieval process. By effectively prompting large language\nmodels (LLMs), we address spelling errors and rephrasing issues in motion\nretrieval. Our approach utilizes a multi-part retrieval strategy to improve the\ngeneralizability of motion retrieval across the language space. We create\ndiverse samples through the spatial composition of the retrieved motions.\nFurthermore, by utilizing low-level, part-specific motion information, we can\nconstruct motion samples for unseen text descriptions. Our experiments\ndemonstrate that our framework can serve as a plug-and-play module, improving\nthe performance of motion diffusion models. Code, pretrained models and sample\nvideos will be made available at: https://motion-rag.github.io/","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion\",\"authors\":\"Kalakonda Sai Shashank, Shubh Maheshwari, Ravi Kiran Sarvadevabhatla\",\"doi\":\"arxiv-2409.12140\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We introduce MoRAG, a novel multi-part fusion based retrieval-augmented\\ngeneration strategy for text-based human motion generation. The method enhances\\nmotion diffusion models by leveraging additional knowledge obtained through an\\nimproved motion retrieval process. By effectively prompting large language\\nmodels (LLMs), we address spelling errors and rephrasing issues in motion\\nretrieval. Our approach utilizes a multi-part retrieval strategy to improve the\\ngeneralizability of motion retrieval across the language space. We create\\ndiverse samples through the spatial composition of the retrieved motions.\\nFurthermore, by utilizing low-level, part-specific motion information, we can\\nconstruct motion samples for unseen text descriptions. Our experiments\\ndemonstrate that our framework can serve as a plug-and-play module, improving\\nthe performance of motion diffusion models. Code, pretrained models and sample\\nvideos will be made available at: https://motion-rag.github.io/\",\"PeriodicalId\":501480,\"journal\":{\"name\":\"arXiv - CS - Multimedia\",\"volume\":\"10 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.12140\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

我们介绍了 MoRAG,这是一种新颖的基于多部分融合的检索-增强生成策略,适用于基于文本的人体动作生成。该方法利用通过动画改进的动作检索过程获得的额外知识来增强动作扩散模型。通过有效地提示大型语言模型(LLM),我们解决了运动检索中的拼写错误和重新措辞问题。我们的方法采用了多部分检索策略,以提高运动检索在整个语言空间的通用性。此外,通过利用低层次、特定部分的运动信息,我们可以为未见的文本描述构建运动样本。我们的实验证明,我们的框架可以作为即插即用模块,提高运动扩散模型的性能。代码、预训练模型和样本视频可在以下网址获取: https://motion-rag.github.io/
本文章由计算机程序翻译,如有差异,请以英文原文为准。
MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion
We introduce MoRAG, a novel multi-part fusion based retrieval-augmented generation strategy for text-based human motion generation. The method enhances motion diffusion models by leveraging additional knowledge obtained through an improved motion retrieval process. By effectively prompting large language models (LLMs), we address spelling errors and rephrasing issues in motion retrieval. Our approach utilizes a multi-part retrieval strategy to improve the generalizability of motion retrieval across the language space. We create diverse samples through the spatial composition of the retrieved motions. Furthermore, by utilizing low-level, part-specific motion information, we can construct motion samples for unseen text descriptions. Our experiments demonstrate that our framework can serve as a plug-and-play module, improving the performance of motion diffusion models. Code, pretrained models and sample videos will be made available at: https://motion-rag.github.io/
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信