B. Kathariya, Zhu Li, Hongtao Wang, G. V. D. Auwera
{"title":"VVC学习环内滤波器的多阶段局部和远程相关特征融合","authors":"B. Kathariya, Zhu Li, Hongtao Wang, G. V. D. Auwera","doi":"10.1109/VCIP56404.2022.10008834","DOIUrl":null,"url":null,"abstract":"Versatile Video Coding (VVC)/H.266 is currently the state-of-the-art video coding standard with significant improvement in coding efficiency over its predecessor High Efficiency Video Coding (HEVC)/H.26S. Nonetheless, VVC is also block-based video coding technology where decoded pictures contain compression artifacts. In VVC, in-loop filters serve to suppress these compression artifacts. In this paper, convolution neural network (CNN) is utilized to better facilitate the suppression of compression artifacts over VVC. Nonetheless, our approach has uniqueness in obtaining better features by exploiting locally correlated spatial features in the pixel domain as well as long-range correlated spectral features in the discrete cosine transform (DCT) domain. In particular, we utilized CNN-features from DCT transformed input to extract high-frequency components and induce long-range correlation into the spatial CNN-features by employing multi-stage feature fusion. Our experimental result shows that the proposed approach achieves significant coding improvements up to 9.70% on average Bjantegaard Delta (BD)-Bitrate savings under AI configurations for luma (Y) components.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Multi-stage Locally and Long-range Correlated Feature Fusion for Learned In-loop Filter in VVC\",\"authors\":\"B. Kathariya, Zhu Li, Hongtao Wang, G. V. D. Auwera\",\"doi\":\"10.1109/VCIP56404.2022.10008834\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Versatile Video Coding (VVC)/H.266 is currently the state-of-the-art video coding standard with significant improvement in coding efficiency over its predecessor High Efficiency Video Coding (HEVC)/H.26S. Nonetheless, VVC is also block-based video coding technology where decoded pictures contain compression artifacts. In VVC, in-loop filters serve to suppress these compression artifacts. In this paper, convolution neural network (CNN) is utilized to better facilitate the suppression of compression artifacts over VVC. Nonetheless, our approach has uniqueness in obtaining better features by exploiting locally correlated spatial features in the pixel domain as well as long-range correlated spectral features in the discrete cosine transform (DCT) domain. In particular, we utilized CNN-features from DCT transformed input to extract high-frequency components and induce long-range correlation into the spatial CNN-features by employing multi-stage feature fusion. Our experimental result shows that the proposed approach achieves significant coding improvements up to 9.70% on average Bjantegaard Delta (BD)-Bitrate savings under AI configurations for luma (Y) components.\",\"PeriodicalId\":269379,\"journal\":{\"name\":\"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)\",\"volume\":\"118 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/VCIP56404.2022.10008834\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VCIP56404.2022.10008834","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-stage Locally and Long-range Correlated Feature Fusion for Learned In-loop Filter in VVC
Versatile Video Coding (VVC)/H.266 is currently the state-of-the-art video coding standard with significant improvement in coding efficiency over its predecessor High Efficiency Video Coding (HEVC)/H.26S. Nonetheless, VVC is also block-based video coding technology where decoded pictures contain compression artifacts. In VVC, in-loop filters serve to suppress these compression artifacts. In this paper, convolution neural network (CNN) is utilized to better facilitate the suppression of compression artifacts over VVC. Nonetheless, our approach has uniqueness in obtaining better features by exploiting locally correlated spatial features in the pixel domain as well as long-range correlated spectral features in the discrete cosine transform (DCT) domain. In particular, we utilized CNN-features from DCT transformed input to extract high-frequency components and induce long-range correlation into the spatial CNN-features by employing multi-stage feature fusion. Our experimental result shows that the proposed approach achieves significant coding improvements up to 9.70% on average Bjantegaard Delta (BD)-Bitrate savings under AI configurations for luma (Y) components.