基于 LSTM 和变压器的人体运动预测神经网络算法研究

IF 0.5 4区数学 Q3 MATHEMATICS

Doklady Mathematics Pub Date : 2024-03-25 DOI:10.1134/S1064562423701624

S. V. Zhiganov, Y. S. Ivanov, D. M. Grabar

{"title":"基于 LSTM 和变压器的人体运动预测神经网络算法研究","authors":"S. V. Zhiganov, Y. S. Ivanov, D. M. Grabar","doi":"10.1134/S1064562423701624","DOIUrl":null,"url":null,"abstract":"<p>The problem of predicting the position of a person on future frames of a video stream is solved, and in-depth experimental studies on the application of traditional and SOTA blocks for this task are carried out. An original architecture of KeyFNet and its modifications based on transform blocks is presented, which is able to predict coordinates in the video stream for 30, 60, 90, and 120 frames ahead with high accuracy. The novelty lies in the application of a combined algorithm based on multiple FNet blocks with fast Fourier transform as an attention mechanism concatenating the coordinates of key points. Experiments on Human3.6M and on our own real data confirmed the effectiveness of the proposed approach based on FNet blocks, compared to the traditional approach based on LSTM. The proposed algorithm matches the accuracy of advanced models, but outperforms them in terms of speed, uses less computational resources, and thus can be applied in collaborative robotic solutions.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"108 2 supplement","pages":"S484 - S493"},"PeriodicalIF":0.5000,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Investigation of Neural Network Algorithms for Human Movement Prediction Based on LSTM and Transformers\",\"authors\":\"S. V. Zhiganov, Y. S. Ivanov, D. M. Grabar\",\"doi\":\"10.1134/S1064562423701624\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The problem of predicting the position of a person on future frames of a video stream is solved, and in-depth experimental studies on the application of traditional and SOTA blocks for this task are carried out. An original architecture of KeyFNet and its modifications based on transform blocks is presented, which is able to predict coordinates in the video stream for 30, 60, 90, and 120 frames ahead with high accuracy. The novelty lies in the application of a combined algorithm based on multiple FNet blocks with fast Fourier transform as an attention mechanism concatenating the coordinates of key points. Experiments on Human3.6M and on our own real data confirmed the effectiveness of the proposed approach based on FNet blocks, compared to the traditional approach based on LSTM. The proposed algorithm matches the accuracy of advanced models, but outperforms them in terms of speed, uses less computational resources, and thus can be applied in collaborative robotic solutions.</p>\",\"PeriodicalId\":531,\"journal\":{\"name\":\"Doklady Mathematics\",\"volume\":\"108 2 supplement\",\"pages\":\"S484 - S493\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2024-03-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Doklady Mathematics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://link.springer.com/article/10.1134/S1064562423701624\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MATHEMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Doklady Mathematics","FirstCategoryId":"100","ListUrlMain":"https://link.springer.com/article/10.1134/S1064562423701624","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS","Score":null,"Total":0}

引用次数: 0

摘要

摘要解决了在视频流的未来帧上预测人物位置的问题，并对传统块和 SOTA 块在此任务中的应用进行了深入的实验研究。本文介绍了 KeyFNet 的原始架构及其基于变换块的修改，该架构能够高精度地预测视频流中未来 30、60、90 和 120 帧的坐标。其新颖之处在于应用了基于多个 FNet 块的组合算法，并将快速傅立叶变换作为一种关注机制，将关键点的坐标串联起来。在 Human3.6M 和我们自己的真实数据上进行的实验证实，与基于 LSTM 的传统方法相比，基于 FNet 块的拟议方法非常有效。所提出的算法与先进模型的准确性相当，但在速度方面优于它们，使用的计算资源更少，因此可以应用于协作机器人解决方案中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Investigation of Neural Network Algorithms for Human Movement Prediction Based on LSTM and Transformers

查看原文本刊更多论文

Investigation of Neural Network Algorithms for Human Movement Prediction Based on LSTM and Transformers

The problem of predicting the position of a person on future frames of a video stream is solved, and in-depth experimental studies on the application of traditional and SOTA blocks for this task are carried out. An original architecture of KeyFNet and its modifications based on transform blocks is presented, which is able to predict coordinates in the video stream for 30, 60, 90, and 120 frames ahead with high accuracy. The novelty lies in the application of a combined algorithm based on multiple FNet blocks with fast Fourier transform as an attention mechanism concatenating the coordinates of key points. Experiments on Human3.6M and on our own real data confirmed the effectiveness of the proposed approach based on FNet blocks, compared to the traditional approach based on LSTM. The proposed algorithm matches the accuracy of advanced models, but outperforms them in terms of speed, uses less computational resources, and thus can be applied in collaborative robotic solutions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Doklady Mathematics 数学-数学

CiteScore

1.00

自引率

16.70%

发文量

审稿时长

3-6 weeks

期刊介绍： Doklady Mathematics is a journal of the Presidium of the Russian Academy of Sciences. It contains English translations of papers published in Doklady Akademii Nauk (Proceedings of the Russian Academy of Sciences), which was founded in 1933 and is published 36 times a year. Doklady Mathematics includes the materials from the following areas: mathematics, mathematical physics, computer science, control theory, and computers. It publishes brief scientific reports on previously unpublished significant new research in mathematics and its applications. The main contributors to the journal are Members of the RAS, Corresponding Members of the RAS, and scientists from the former Soviet Union and other foreign countries. Among the contributors are the outstanding Russian mathematicians.