基于深度学习前向参考的基于人体姿态的视频压缩

S. Rajin, M. Murshed, M. Paul, S. Teng, Jiangang Ma
{"title":"基于深度学习前向参考的基于人体姿态的视频压缩","authors":"S. Rajin, M. Murshed, M. Paul, S. Teng, Jiangang Ma","doi":"10.1109/VCIP56404.2022.10008897","DOIUrl":null,"url":null,"abstract":"To exploit high temporal correlations in video frames of the same scene, the current frame is predicted from the already-encoded reference frames using block-based motion estimation and compensation techniques. While this approach can efficiently exploit the translation motion of the moving objects, it is susceptible to other types of affine motion and object occlusion/deocclusion. Recently, deep learning has been used to model the high-level structure of human pose in specific actions from short videos and then generate virtual frames in future time by predicting the pose using a generative adversarial network (GAN). Therefore, modelling the high-level structure of human pose is able to exploit semantic correlation by predicting human actions and determining its trajectory. Video surveillance applications will benefit as stored “big” surveillance data can be compressed by estimating human pose trajectories and generating future frames through semantic correlation. This paper explores a new way of video coding by modelling human pose from the already-encoded frames and using the generated frame at the current time as an additional forward-referencing frame. It is expected that the proposed approach can overcome the limitations of the traditional backward-referencing frames by predicting the blocks containing the moving objects with lower residuals. Our experimental results show that the proposed approach can achieve on average up to 2.83 dB PSNR gain and 25.93% bitrate savings for high motion video sequences compared to standard video coding.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Human pose based video compression via forward-referencing using deep learning\",\"authors\":\"S. Rajin, M. Murshed, M. Paul, S. Teng, Jiangang Ma\",\"doi\":\"10.1109/VCIP56404.2022.10008897\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To exploit high temporal correlations in video frames of the same scene, the current frame is predicted from the already-encoded reference frames using block-based motion estimation and compensation techniques. While this approach can efficiently exploit the translation motion of the moving objects, it is susceptible to other types of affine motion and object occlusion/deocclusion. Recently, deep learning has been used to model the high-level structure of human pose in specific actions from short videos and then generate virtual frames in future time by predicting the pose using a generative adversarial network (GAN). Therefore, modelling the high-level structure of human pose is able to exploit semantic correlation by predicting human actions and determining its trajectory. Video surveillance applications will benefit as stored “big” surveillance data can be compressed by estimating human pose trajectories and generating future frames through semantic correlation. This paper explores a new way of video coding by modelling human pose from the already-encoded frames and using the generated frame at the current time as an additional forward-referencing frame. It is expected that the proposed approach can overcome the limitations of the traditional backward-referencing frames by predicting the blocks containing the moving objects with lower residuals. Our experimental results show that the proposed approach can achieve on average up to 2.83 dB PSNR gain and 25.93% bitrate savings for high motion video sequences compared to standard video coding.\",\"PeriodicalId\":269379,\"journal\":{\"name\":\"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/VCIP56404.2022.10008897\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VCIP56404.2022.10008897","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

为了在同一场景的视频帧中利用高时间相关性,使用基于块的运动估计和补偿技术从已经编码的参考帧中预测当前帧。虽然这种方法可以有效地利用运动物体的平移运动,但它容易受到其他类型的仿射运动和物体遮挡/去遮挡的影响。最近,深度学习已被用于从短视频中模拟特定动作中的人体姿势的高级结构,然后通过使用生成对抗网络(GAN)预测姿势来生成未来时间的虚拟帧。因此,建模人体姿态的高层结构能够通过预测人类行为和确定其轨迹来利用语义相关性。视频监控应用将受益,因为存储的“大”监控数据可以通过估计人体姿态轨迹来压缩,并通过语义关联生成未来帧。本文探索了一种新的视频编码方法,即从已经编码的帧中对人体姿态进行建模,并使用当前生成的帧作为额外的前向参考帧。期望该方法能够以较低的残差预测包含运动目标的块,从而克服传统后向参考帧的局限性。实验结果表明,与标准视频编码相比,该方法可以实现平均高达2.83 dB的PSNR增益和25.93%的比特率节省。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Human pose based video compression via forward-referencing using deep learning
To exploit high temporal correlations in video frames of the same scene, the current frame is predicted from the already-encoded reference frames using block-based motion estimation and compensation techniques. While this approach can efficiently exploit the translation motion of the moving objects, it is susceptible to other types of affine motion and object occlusion/deocclusion. Recently, deep learning has been used to model the high-level structure of human pose in specific actions from short videos and then generate virtual frames in future time by predicting the pose using a generative adversarial network (GAN). Therefore, modelling the high-level structure of human pose is able to exploit semantic correlation by predicting human actions and determining its trajectory. Video surveillance applications will benefit as stored “big” surveillance data can be compressed by estimating human pose trajectories and generating future frames through semantic correlation. This paper explores a new way of video coding by modelling human pose from the already-encoded frames and using the generated frame at the current time as an additional forward-referencing frame. It is expected that the proposed approach can overcome the limitations of the traditional backward-referencing frames by predicting the blocks containing the moving objects with lower residuals. Our experimental results show that the proposed approach can achieve on average up to 2.83 dB PSNR gain and 25.93% bitrate savings for high motion video sequences compared to standard video coding.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信