{"title":"KSOF: Leveraging kinematics and spatio-temporal optimal fusion for human motion prediction","authors":"Rui Ding , KeHua Qu , Jin Tang","doi":"10.1016/j.patcog.2024.111206","DOIUrl":null,"url":null,"abstract":"<div><div>Ignoring the meaningful kinematics law, which generates improbable or impractical predictions, is one of the obstacles to human motion prediction. Current methods attempt to tackle this problem by taking simple kinematics information as auxiliary features to improve predictions. However, it remains challenging to utilize human prior knowledge deeply, such as the trajectory formed by the same joint should be smooth and continuous in this task. In this paper, we advocate explicitly describing kinematics information via velocity and acceleration by proposing a novel loss called joint point smoothness (JPS) loss, which calculates the acceleration of joints to smooth the sudden change in joint velocity. In addition, capturing spatio-temporal dependencies to make feature representations more informative is also one of the obstacles in this task. Therefore, we propose a dual-path network (KSOF) that models the temporal and spatial dependencies from kinematic temporal convolutional network (K-TCN) and spatial graph convolutional networks (S-GCN), respectively. Moreover, we propose a novel multi-scale fusion module named spatio-temporal optimal fusion (SOF) to enhance extraction of the essential correlation and important features at different scales from spatio-temporal coupling features. We evaluate our approach on three standard benchmark datasets, including Human3.6M, CMU-Mocap, and 3DPW datasets. For both short-term and long-term predictions, our method achieves outstanding performance on all these datasets. The code is available at <span><span>https://github.com/qukehua/KSOF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"161 ","pages":"Article 111206"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320324009579","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Ignoring the meaningful kinematics law, which generates improbable or impractical predictions, is one of the obstacles to human motion prediction. Current methods attempt to tackle this problem by taking simple kinematics information as auxiliary features to improve predictions. However, it remains challenging to utilize human prior knowledge deeply, such as the trajectory formed by the same joint should be smooth and continuous in this task. In this paper, we advocate explicitly describing kinematics information via velocity and acceleration by proposing a novel loss called joint point smoothness (JPS) loss, which calculates the acceleration of joints to smooth the sudden change in joint velocity. In addition, capturing spatio-temporal dependencies to make feature representations more informative is also one of the obstacles in this task. Therefore, we propose a dual-path network (KSOF) that models the temporal and spatial dependencies from kinematic temporal convolutional network (K-TCN) and spatial graph convolutional networks (S-GCN), respectively. Moreover, we propose a novel multi-scale fusion module named spatio-temporal optimal fusion (SOF) to enhance extraction of the essential correlation and important features at different scales from spatio-temporal coupling features. We evaluate our approach on three standard benchmark datasets, including Human3.6M, CMU-Mocap, and 3DPW datasets. For both short-term and long-term predictions, our method achieves outstanding performance on all these datasets. The code is available at https://github.com/qukehua/KSOF.
忽视有意义的运动学规律,产生不可能或不切实际的预测,是人体运动预测的障碍之一。目前的方法试图通过简单的运动学信息作为辅助特征来改进预测来解决这个问题。然而,如何深入利用人类的先验知识仍然是一个挑战,例如同一关节形成的轨迹必须是光滑连续的。在本文中,我们提倡通过速度和加速度显式描述运动学信息,提出了一种新的损失,称为关节点平滑损失(JPS),它计算关节的加速度,以平滑关节速度的突然变化。此外,捕获时空依赖关系以使特征表示更具信息性也是该任务的障碍之一。因此,我们提出了一种双路径网络(KSOF),分别从运动学时间卷积网络(K-TCN)和空间图卷积网络(S-GCN)中建模时间和空间依赖关系。此外,我们提出了一种新的多尺度融合模块——时空最优融合(SOF),以增强从时空耦合特征中提取不同尺度的本质相关性和重要特征。我们在三个标准基准数据集上评估了我们的方法,包括Human3.6M, mu - mocap和3DPW数据集。对于短期和长期预测,我们的方法在所有这些数据集上都取得了出色的性能。代码可在https://github.com/qukehua/KSOF上获得。
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.