MoRAG -- 针对人体运动的多融合检索增强生成技术

arXiv - CS - Multimedia Pub Date : 2024-09-18 DOI:arxiv-2409.12140

Kalakonda Sai Shashank, Shubh Maheshwari, Ravi Kiran Sarvadevabhatla

{"title":"MoRAG -- 针对人体运动的多融合检索增强生成技术","authors":"Kalakonda Sai Shashank, Shubh Maheshwari, Ravi Kiran Sarvadevabhatla","doi":"arxiv-2409.12140","DOIUrl":null,"url":null,"abstract":"We introduce MoRAG, a novel multi-part fusion based retrieval-augmented\ngeneration strategy for text-based human motion generation. The method enhances\nmotion diffusion models by leveraging additional knowledge obtained through an\nimproved motion retrieval process. By effectively prompting large language\nmodels (LLMs), we address spelling errors and rephrasing issues in motion\nretrieval. Our approach utilizes a multi-part retrieval strategy to improve the\ngeneralizability of motion retrieval across the language space. We create\ndiverse samples through the spatial composition of the retrieved motions.\nFurthermore, by utilizing low-level, part-specific motion information, we can\nconstruct motion samples for unseen text descriptions. Our experiments\ndemonstrate that our framework can serve as a plug-and-play module, improving\nthe performance of motion diffusion models. Code, pretrained models and sample\nvideos will be made available at: https://motion-rag.github.io/","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion\",\"authors\":\"Kalakonda Sai Shashank, Shubh Maheshwari, Ravi Kiran Sarvadevabhatla\",\"doi\":\"arxiv-2409.12140\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We introduce MoRAG, a novel multi-part fusion based retrieval-augmented\\ngeneration strategy for text-based human motion generation. The method enhances\\nmotion diffusion models by leveraging additional knowledge obtained through an\\nimproved motion retrieval process. By effectively prompting large language\\nmodels (LLMs), we address spelling errors and rephrasing issues in motion\\nretrieval. Our approach utilizes a multi-part retrieval strategy to improve the\\ngeneralizability of motion retrieval across the language space. We create\\ndiverse samples through the spatial composition of the retrieved motions.\\nFurthermore, by utilizing low-level, part-specific motion information, we can\\nconstruct motion samples for unseen text descriptions. Our experiments\\ndemonstrate that our framework can serve as a plug-and-play module, improving\\nthe performance of motion diffusion models. Code, pretrained models and sample\\nvideos will be made available at: https://motion-rag.github.io/\",\"PeriodicalId\":501480,\"journal\":{\"name\":\"arXiv - CS - Multimedia\",\"volume\":\"10 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.12140\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

我们介绍了 MoRAG，这是一种新颖的基于多部分融合的检索-增强生成策略，适用于基于文本的人体动作生成。该方法利用通过动画改进的动作检索过程获得的额外知识来增强动作扩散模型。通过有效地提示大型语言模型（LLM），我们解决了运动检索中的拼写错误和重新措辞问题。我们的方法采用了多部分检索策略，以提高运动检索在整个语言空间的通用性。此外，通过利用低层次、特定部分的运动信息，我们可以为未见的文本描述构建运动样本。我们的实验证明，我们的框架可以作为即插即用模块，提高运动扩散模型的性能。代码、预训练模型和样本视频可在以下网址获取： https://motion-rag.github.io/

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion

We introduce MoRAG, a novel multi-part fusion based retrieval-augmented generation strategy for text-based human motion generation. The method enhances motion diffusion models by leveraging additional knowledge obtained through an improved motion retrieval process. By effectively prompting large language models (LLMs), we address spelling errors and rephrasing issues in motion retrieval. Our approach utilizes a multi-part retrieval strategy to improve the generalizability of motion retrieval across the language space. We create diverse samples through the spatial composition of the retrieved motions. Furthermore, by utilizing low-level, part-specific motion information, we can construct motion samples for unseen text descriptions. Our experiments demonstrate that our framework can serve as a plug-and-play module, improving the performance of motion diffusion models. Code, pretrained models and sample videos will be made available at: https://motion-rag.github.io/

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Multimedia

自引率

0.00%

发文量