KSOF: Leveraging kinematics and spatio-temporal optimal fusion for human motion prediction

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2024-11-29 DOI:10.1016/j.patcog.2024.111206

Rui Ding , KeHua Qu , Jin Tang

{"title":"KSOF: Leveraging kinematics and spatio-temporal optimal fusion for human motion prediction","authors":"Rui Ding , KeHua Qu , Jin Tang","doi":"10.1016/j.patcog.2024.111206","DOIUrl":null,"url":null,"abstract":"<div><div>Ignoring the meaningful kinematics law, which generates improbable or impractical predictions, is one of the obstacles to human motion prediction. Current methods attempt to tackle this problem by taking simple kinematics information as auxiliary features to improve predictions. However, it remains challenging to utilize human prior knowledge deeply, such as the trajectory formed by the same joint should be smooth and continuous in this task. In this paper, we advocate explicitly describing kinematics information via velocity and acceleration by proposing a novel loss called joint point smoothness (JPS) loss, which calculates the acceleration of joints to smooth the sudden change in joint velocity. In addition, capturing spatio-temporal dependencies to make feature representations more informative is also one of the obstacles in this task. Therefore, we propose a dual-path network (KSOF) that models the temporal and spatial dependencies from kinematic temporal convolutional network (K-TCN) and spatial graph convolutional networks (S-GCN), respectively. Moreover, we propose a novel multi-scale fusion module named spatio-temporal optimal fusion (SOF) to enhance extraction of the essential correlation and important features at different scales from spatio-temporal coupling features. We evaluate our approach on three standard benchmark datasets, including Human3.6M, CMU-Mocap, and 3DPW datasets. For both short-term and long-term predictions, our method achieves outstanding performance on all these datasets. The code is available at <span><span>https://github.com/qukehua/KSOF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"161 ","pages":"Article 111206"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320324009579","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Ignoring the meaningful kinematics law, which generates improbable or impractical predictions, is one of the obstacles to human motion prediction. Current methods attempt to tackle this problem by taking simple kinematics information as auxiliary features to improve predictions. However, it remains challenging to utilize human prior knowledge deeply, such as the trajectory formed by the same joint should be smooth and continuous in this task. In this paper, we advocate explicitly describing kinematics information via velocity and acceleration by proposing a novel loss called joint point smoothness (JPS) loss, which calculates the acceleration of joints to smooth the sudden change in joint velocity. In addition, capturing spatio-temporal dependencies to make feature representations more informative is also one of the obstacles in this task. Therefore, we propose a dual-path network (KSOF) that models the temporal and spatial dependencies from kinematic temporal convolutional network (K-TCN) and spatial graph convolutional networks (S-GCN), respectively. Moreover, we propose a novel multi-scale fusion module named spatio-temporal optimal fusion (SOF) to enhance extraction of the essential correlation and important features at different scales from spatio-temporal coupling features. We evaluate our approach on three standard benchmark datasets, including Human3.6M, CMU-Mocap, and 3DPW datasets. For both short-term and long-term predictions, our method achieves outstanding performance on all these datasets. The code is available at https://github.com/qukehua/KSOF.

查看原文本刊更多论文

基于运动学和时空优化融合的人体运动预测

忽视有意义的运动学规律，产生不可能或不切实际的预测，是人体运动预测的障碍之一。目前的方法试图通过简单的运动学信息作为辅助特征来改进预测来解决这个问题。然而，如何深入利用人类的先验知识仍然是一个挑战，例如同一关节形成的轨迹必须是光滑连续的。在本文中，我们提倡通过速度和加速度显式描述运动学信息，提出了一种新的损失，称为关节点平滑损失（JPS），它计算关节的加速度，以平滑关节速度的突然变化。此外，捕获时空依赖关系以使特征表示更具信息性也是该任务的障碍之一。因此，我们提出了一种双路径网络（KSOF），分别从运动学时间卷积网络（K-TCN）和空间图卷积网络（S-GCN）中建模时间和空间依赖关系。此外，我们提出了一种新的多尺度融合模块——时空最优融合（SOF），以增强从时空耦合特征中提取不同尺度的本质相关性和重要特征。我们在三个标准基准数据集上评估了我们的方法，包括Human3.6M， mu - mocap和3DPW数据集。对于短期和长期预测，我们的方法在所有这些数据集上都取得了出色的性能。代码可在https://github.com/qukehua/KSOF上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.