NeFT-Net: N-window extended frequency transformer for rhythmic motion prediction

IF 2.8 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk Pub Date : 2025-05-17 DOI:10.1016/j.cag.2025.104244

Adeyemi Ademola , David Sinclair , Babis Koniaris , Samantha Hannah , Kenny Mitchell

{"title":"NeFT-Net: N-window extended frequency transformer for rhythmic motion prediction","authors":"Adeyemi Ademola , David Sinclair , Babis Koniaris , Samantha Hannah , Kenny Mitchell","doi":"10.1016/j.cag.2025.104244","DOIUrl":null,"url":null,"abstract":"<div><div>Advancements in prediction of human motion sequences are critical for enabling online virtual reality (VR) users to dance and move in ways that accurately mirror real-world actions, delivering a more immersive and connected experience. However, latency in networked motion tracking remains a significant challenge, disrupting engagement and necessitating predictive solutions to achieve real-time synchronization of remote motions. To address this issue, we propose a novel approach leveraging a synthetically generated dataset based on supervised foot anchor placement timings for rhythmic motions, ensuring periodicity and reducing prediction errors. Our model integrates a discrete cosine transform (DCT) to encode motion, refine high-frequency components, and smooth motion sequences, mitigating jittery artifacts. Additionally, we introduce a feed-forward attention mechanism designed to learn from N-window pairs of 3D key-point pose histories for precise future motion prediction. Quantitative and qualitative evaluations on the Human3.6M dataset highlight significant improvements in mean per joint position error (MPJPE) metrics, demonstrating the superiority of our technique over state-of-the-art approaches. We further introduce novel result pose visualizations through the use of generative AI methods.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"129 ","pages":"Article 104244"},"PeriodicalIF":2.8000,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Graphics-Uk","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0097849325000858","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Advancements in prediction of human motion sequences are critical for enabling online virtual reality (VR) users to dance and move in ways that accurately mirror real-world actions, delivering a more immersive and connected experience. However, latency in networked motion tracking remains a significant challenge, disrupting engagement and necessitating predictive solutions to achieve real-time synchronization of remote motions. To address this issue, we propose a novel approach leveraging a synthetically generated dataset based on supervised foot anchor placement timings for rhythmic motions, ensuring periodicity and reducing prediction errors. Our model integrates a discrete cosine transform (DCT) to encode motion, refine high-frequency components, and smooth motion sequences, mitigating jittery artifacts. Additionally, we introduce a feed-forward attention mechanism designed to learn from N-window pairs of 3D key-point pose histories for precise future motion prediction. Quantitative and qualitative evaluations on the Human3.6M dataset highlight significant improvements in mean per joint position error (MPJPE) metrics, demonstrating the superiority of our technique over state-of-the-art approaches. We further introduce novel result pose visualizations through the use of generative AI methods.

Abstract Image

查看原文本刊更多论文

用于节奏运动预测的n窗扩展频率互感器

人类动作序列预测技术的进步对于在线虚拟现实（VR）用户能够以准确反映现实世界动作的方式跳舞和移动至关重要，从而提供更加身临其境和互联的体验。然而，网络运动跟踪中的延迟仍然是一个重大挑战，它破坏了参与性，需要预测解决方案来实现远程运动的实时同步。为了解决这个问题，我们提出了一种新的方法，利用基于有节奏运动的监督脚锚放置时间的综合生成数据集，确保周期性并减少预测误差。我们的模型集成了离散余弦变换（DCT）来编码运动，细化高频成分，平滑运动序列，减轻抖动伪像。此外，我们引入了一种前馈注意机制，旨在从3D关键点姿态历史的n个窗口对中学习，以精确预测未来的运动。对Human3.6M数据集的定量和定性评估突出了每个关节位置误差（MPJPE）指标的显着改进，证明了我们的技术优于最先进的方法。通过使用生成式人工智能方法，我们进一步引入了新的结果姿态可视化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Graphics-Uk 工程技术-计算机：软件工程

CiteScore

5.30

自引率

12.00%

发文量

173

审稿时长

38 days

期刊介绍： Computers & Graphics is dedicated to disseminate information on research and applications of computer graphics (CG) techniques. The journal encourages articles on: 1. Research and applications of interactive computer graphics. We are particularly interested in novel interaction techniques and applications of CG to problem domains. 2. State-of-the-art papers on late-breaking, cutting-edge research on CG. 3. Information on innovative uses of graphics principles and technologies. 4. Tutorial papers on both teaching CG principles and innovative uses of CG in education.