Zhao Wang;Bolin Chen;Shurun Wang;Shiqi Wang;Yan Ye;Siwei Ma
{"title":"基于三维关键点到二维运动图转换的超低比特率人脸视频压缩","authors":"Zhao Wang;Bolin Chen;Shurun Wang;Shiqi Wang;Yan Ye;Siwei Ma","doi":"10.1109/TIP.2024.3518100","DOIUrl":null,"url":null,"abstract":"How to compress face video is a crucial problem for a series of online applications, such as video chat/conference, live broadcasting and remote education. Compared to other natural videos, these face-centric videos owning abundant structural information can be compactly represented and high-quality reconstructed via deep generative models, such that the promising compression performance can be achieved. However, the existing generative face video compression schemes are faced with the inconsistency between the 3D facial motion in the physical world and the face content evolution in the 2D view. To solve this drawback, we propose a 3D-Keypoint-and-2D-Motion based generative method for Face Video Compression, namely FVC-3K2M, which can well ensure perceptual compensation and visual consistency between motion description and face reconstruction. In particular, the temporal evolution of face video can be characterized into separate 3D keypoints from the global and local perspectives, entailing great coding flexibility and accurate motion representation. Moreover, a cascade motion conversion mechanism is further proposed to internally convert 3D keypoints to 2D dense motion, enforcing the face video reconstruction to be perceptually realistic. Finally, an adaptive reference frame selection scheme is developed to enhance the adaptation of various temporal movements. Experimental results show that the proposed scheme can realize reliable video communication in the extremely limited bandwidth, e.g., 2 kbps. Compared to the state-of-the-art video coding standards and the latest face video compression methods, extensive comparisons demonstrate that our proposed scheme achieves superior compression performance in terms of multiple quality evaluations.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6850-6864"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Ultra-Low Bitrate Face Video Compression Based on Conversions From 3D Keypoints to 2D Motion Map\",\"authors\":\"Zhao Wang;Bolin Chen;Shurun Wang;Shiqi Wang;Yan Ye;Siwei Ma\",\"doi\":\"10.1109/TIP.2024.3518100\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"How to compress face video is a crucial problem for a series of online applications, such as video chat/conference, live broadcasting and remote education. Compared to other natural videos, these face-centric videos owning abundant structural information can be compactly represented and high-quality reconstructed via deep generative models, such that the promising compression performance can be achieved. However, the existing generative face video compression schemes are faced with the inconsistency between the 3D facial motion in the physical world and the face content evolution in the 2D view. To solve this drawback, we propose a 3D-Keypoint-and-2D-Motion based generative method for Face Video Compression, namely FVC-3K2M, which can well ensure perceptual compensation and visual consistency between motion description and face reconstruction. In particular, the temporal evolution of face video can be characterized into separate 3D keypoints from the global and local perspectives, entailing great coding flexibility and accurate motion representation. Moreover, a cascade motion conversion mechanism is further proposed to internally convert 3D keypoints to 2D dense motion, enforcing the face video reconstruction to be perceptually realistic. Finally, an adaptive reference frame selection scheme is developed to enhance the adaptation of various temporal movements. Experimental results show that the proposed scheme can realize reliable video communication in the extremely limited bandwidth, e.g., 2 kbps. Compared to the state-of-the-art video coding standards and the latest face video compression methods, extensive comparisons demonstrate that our proposed scheme achieves superior compression performance in terms of multiple quality evaluations.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"33 \",\"pages\":\"6850-6864\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-12-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10811831/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10811831/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Ultra-Low Bitrate Face Video Compression Based on Conversions From 3D Keypoints to 2D Motion Map
How to compress face video is a crucial problem for a series of online applications, such as video chat/conference, live broadcasting and remote education. Compared to other natural videos, these face-centric videos owning abundant structural information can be compactly represented and high-quality reconstructed via deep generative models, such that the promising compression performance can be achieved. However, the existing generative face video compression schemes are faced with the inconsistency between the 3D facial motion in the physical world and the face content evolution in the 2D view. To solve this drawback, we propose a 3D-Keypoint-and-2D-Motion based generative method for Face Video Compression, namely FVC-3K2M, which can well ensure perceptual compensation and visual consistency between motion description and face reconstruction. In particular, the temporal evolution of face video can be characterized into separate 3D keypoints from the global and local perspectives, entailing great coding flexibility and accurate motion representation. Moreover, a cascade motion conversion mechanism is further proposed to internally convert 3D keypoints to 2D dense motion, enforcing the face video reconstruction to be perceptually realistic. Finally, an adaptive reference frame selection scheme is developed to enhance the adaptation of various temporal movements. Experimental results show that the proposed scheme can realize reliable video communication in the extremely limited bandwidth, e.g., 2 kbps. Compared to the state-of-the-art video coding standards and the latest face video compression methods, extensive comparisons demonstrate that our proposed scheme achieves superior compression performance in terms of multiple quality evaluations.