基于时间片段Banzhaf交互作用的稀有文本运动扩散模型。

IEEE transactions on visualization and computer graphics Pub Date : 2025-07-11 DOI:10.1109/TVCG.2025.3588509

Yin Wang, Mu Li, Zhiying Leng, Frederick W B Li, Xiaohui Liang

{"title":"基于时间片段Banzhaf交互作用的稀有文本运动扩散模型。","authors":"Yin Wang, Mu Li, Zhiying Leng, Frederick W B Li, Xiaohui Liang","doi":"10.1109/TVCG.2025.3588509","DOIUrl":null,"url":null,"abstract":"We introduce MOST, a novel MOtion diffuSion model via Temporal clip Banzhaf interaction, aimed at addressing the persistent challenge of generating human motion from rare language prompts. While previous approaches struggle with coarse-grained matching and overlook important semantic cues due to motion redundancy, our key insight lies in leveraging fine-grained clip relationships to mitigate these issues. MOST's retrieval stage presents the first formulation of its kind - temporal clip Banzhaf interaction - which precisely quantifies textualmotion coherence at the clip level. This facilitates direct, finegrained text-to-motion clip matching and eliminates prevalent redundancy. In the generation stage, a motion prompt module effectively utilizes retrieved motion clips to produce semantically consistent movements. Extensive evaluations confirm that MOST achieves state-of-the-art text-to-motion retrieval and generation performance by comprehensively addressing previous challenges, as demonstrated through quantitative and qualitative results highlighting its effectiveness, especially for rare prompts.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MOST: Motion Diffusion Model for Rare Text via Temporal Clip Banzhaf Interaction.\",\"authors\":\"Yin Wang, Mu Li, Zhiying Leng, Frederick W B Li, Xiaohui Liang\",\"doi\":\"10.1109/TVCG.2025.3588509\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We introduce MOST, a novel MOtion diffuSion model via Temporal clip Banzhaf interaction, aimed at addressing the persistent challenge of generating human motion from rare language prompts. While previous approaches struggle with coarse-grained matching and overlook important semantic cues due to motion redundancy, our key insight lies in leveraging fine-grained clip relationships to mitigate these issues. MOST's retrieval stage presents the first formulation of its kind - temporal clip Banzhaf interaction - which precisely quantifies textualmotion coherence at the clip level. This facilitates direct, finegrained text-to-motion clip matching and eliminates prevalent redundancy. In the generation stage, a motion prompt module effectively utilizes retrieved motion clips to produce semantically consistent movements. Extensive evaluations confirm that MOST achieves state-of-the-art text-to-motion retrieval and generation performance by comprehensively addressing previous challenges, as demonstrated through quantitative and qualitative results highlighting its effectiveness, especially for rare prompts.\",\"PeriodicalId\":94035,\"journal\":{\"name\":\"IEEE transactions on visualization and computer graphics\",\"volume\":\"PP \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on visualization and computer graphics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TVCG.2025.3588509\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on visualization and computer graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TVCG.2025.3588509","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

我们引入了一种新的运动扩散模型MOST，该模型通过时间片段Banzhaf交互来解决从罕见的语言提示生成人类运动的持续挑战。虽然以前的方法与粗粒度匹配斗争，并且由于运动冗余而忽略了重要的语义线索，但我们的关键见解在于利用细粒度剪辑关系来缓解这些问题。MOST的检索阶段提出了其类型的第一个公式-时间剪辑班扎夫相互作用-精确量化剪辑级别的文本运动相干性。这有助于直接，细粒度的文本到运动剪辑匹配，并消除了普遍的冗余。在生成阶段，运动提示模块有效地利用检索到的运动片段来产生语义一致的运动。广泛的评估证实，MOST通过全面解决以前的挑战，实现了最先进的文本到动作的检索和生成性能，正如定量和定性结果所证明的那样，突出了其有效性，特别是对于罕见的提示。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MOST: Motion Diffusion Model for Rare Text via Temporal Clip Banzhaf Interaction.

We introduce MOST, a novel MOtion diffuSion model via Temporal clip Banzhaf interaction, aimed at addressing the persistent challenge of generating human motion from rare language prompts. While previous approaches struggle with coarse-grained matching and overlook important semantic cues due to motion redundancy, our key insight lies in leveraging fine-grained clip relationships to mitigate these issues. MOST's retrieval stage presents the first formulation of its kind - temporal clip Banzhaf interaction - which precisely quantifies textualmotion coherence at the clip level. This facilitates direct, finegrained text-to-motion clip matching and eliminates prevalent redundancy. In the generation stage, a motion prompt module effectively utilizes retrieved motion clips to produce semantically consistent movements. Extensive evaluations confirm that MOST achieves state-of-the-art text-to-motion retrieval and generation performance by comprehensively addressing previous challenges, as demonstrated through quantitative and qualitative results highlighting its effectiveness, especially for rare prompts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on visualization and computer graphics

自引率

0.00%

发文量