Investigation of Neural Network Algorithms for Human Movement Prediction Based on LSTM and Transformers

Pub Date : 2024-03-25 DOI:10.1134/S1064562423701624
S. V. Zhiganov, Y. S. Ivanov, D. M. Grabar
{"title":"Investigation of Neural Network Algorithms for Human Movement Prediction Based on LSTM and Transformers","authors":"S. V. Zhiganov,&nbsp;Y. S. Ivanov,&nbsp;D. M. Grabar","doi":"10.1134/S1064562423701624","DOIUrl":null,"url":null,"abstract":"<p>The problem of predicting the position of a person on future frames of a video stream is solved, and in-depth experimental studies on the application of traditional and SOTA blocks for this task are carried out. An original architecture of KeyFNet and its modifications based on transform blocks is presented, which is able to predict coordinates in the video stream for 30, 60, 90, and 120 frames ahead with high accuracy. The novelty lies in the application of a combined algorithm based on multiple FNet blocks with fast Fourier transform as an attention mechanism concatenating the coordinates of key points. Experiments on Human3.6M and on our own real data confirmed the effectiveness of the proposed approach based on FNet blocks, compared to the traditional approach based on LSTM. The proposed algorithm matches the accuracy of advanced models, but outperforms them in terms of speed, uses less computational resources, and thus can be applied in collaborative robotic solutions.</p>","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"100","ListUrlMain":"https://link.springer.com/article/10.1134/S1064562423701624","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The problem of predicting the position of a person on future frames of a video stream is solved, and in-depth experimental studies on the application of traditional and SOTA blocks for this task are carried out. An original architecture of KeyFNet and its modifications based on transform blocks is presented, which is able to predict coordinates in the video stream for 30, 60, 90, and 120 frames ahead with high accuracy. The novelty lies in the application of a combined algorithm based on multiple FNet blocks with fast Fourier transform as an attention mechanism concatenating the coordinates of key points. Experiments on Human3.6M and on our own real data confirmed the effectiveness of the proposed approach based on FNet blocks, compared to the traditional approach based on LSTM. The proposed algorithm matches the accuracy of advanced models, but outperforms them in terms of speed, uses less computational resources, and thus can be applied in collaborative robotic solutions.

Abstract Image

Abstract Image

分享
查看原文
基于 LSTM 和变压器的人体运动预测神经网络算法研究
摘要 解决了在视频流的未来帧上预测人物位置的问题,并对传统块和 SOTA 块在此任务中的应用进行了深入的实验研究。本文介绍了 KeyFNet 的原始架构及其基于变换块的修改,该架构能够高精度地预测视频流中未来 30、60、90 和 120 帧的坐标。其新颖之处在于应用了基于多个 FNet 块的组合算法,并将快速傅立叶变换作为一种关注机制,将关键点的坐标串联起来。在 Human3.6M 和我们自己的真实数据上进行的实验证实,与基于 LSTM 的传统方法相比,基于 FNet 块的拟议方法非常有效。所提出的算法与先进模型的准确性相当,但在速度方面优于它们,使用的计算资源更少,因此可以应用于协作机器人解决方案中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信