Lu Sun;Fangfang Wu;Wei Ding;Xin Li;Jie Lin;Weisheng Dong;Guangming Shi
{"title":"用于轻量级视频去噪的多尺度时空记忆网络","authors":"Lu Sun;Fangfang Wu;Wei Ding;Xin Li;Jie Lin;Weisheng Dong;Guangming Shi","doi":"10.1109/TIP.2024.3444315","DOIUrl":null,"url":null,"abstract":"Deep learning-based video denoising methods have achieved great performance improvements in recent years. However, the expensive computational cost arising from sophisticated network design has severely limited their applications in real-world scenarios. To address this practical weakness, we propose a multiscale spatio-temporal memory network for fast video denoising, named MSTMN, aiming at striking an improved trade-off between cost and performance. To develop an efficient and effective algorithm for video denoising, we exploit a multiscale representation based on the Gaussian-Laplacian pyramid decomposition so that the reference frame can be restored in a coarse-to-fine manner. Guided by a model-based optimization approach, we design an effective variance estimation module, an alignment error estimation module and an adaptive fusion module for each scale of the pyramid representation. For the fusion module, we employ a reconstruction recurrence strategy to incorporate local temporal information. Moreover, we propose a memory enhancement module to exploit the global spatio-temporal information. Meanwhile, the similarity computation of the spatio-temporal memory network enables the proposed network to adaptively search the valuable information at the patch level, which avoids computationally expensive motion estimation and compensation operations. Experimental results on real-world raw video datasets have demonstrated that the proposed lightweight network outperforms current state-of-the-art fast video denoising algorithms such as FastDVDnet, EMVD, and ReMoNet with fewer computational costs.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5810-5823"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Scale Spatio-Temporal Memory Network for Lightweight Video Denoising\",\"authors\":\"Lu Sun;Fangfang Wu;Wei Ding;Xin Li;Jie Lin;Weisheng Dong;Guangming Shi\",\"doi\":\"10.1109/TIP.2024.3444315\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning-based video denoising methods have achieved great performance improvements in recent years. However, the expensive computational cost arising from sophisticated network design has severely limited their applications in real-world scenarios. To address this practical weakness, we propose a multiscale spatio-temporal memory network for fast video denoising, named MSTMN, aiming at striking an improved trade-off between cost and performance. To develop an efficient and effective algorithm for video denoising, we exploit a multiscale representation based on the Gaussian-Laplacian pyramid decomposition so that the reference frame can be restored in a coarse-to-fine manner. Guided by a model-based optimization approach, we design an effective variance estimation module, an alignment error estimation module and an adaptive fusion module for each scale of the pyramid representation. For the fusion module, we employ a reconstruction recurrence strategy to incorporate local temporal information. Moreover, we propose a memory enhancement module to exploit the global spatio-temporal information. Meanwhile, the similarity computation of the spatio-temporal memory network enables the proposed network to adaptively search the valuable information at the patch level, which avoids computationally expensive motion estimation and compensation operations. Experimental results on real-world raw video datasets have demonstrated that the proposed lightweight network outperforms current state-of-the-art fast video denoising algorithms such as FastDVDnet, EMVD, and ReMoNet with fewer computational costs.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"33 \",\"pages\":\"5810-5823\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10709843/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10709843/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-Scale Spatio-Temporal Memory Network for Lightweight Video Denoising
Deep learning-based video denoising methods have achieved great performance improvements in recent years. However, the expensive computational cost arising from sophisticated network design has severely limited their applications in real-world scenarios. To address this practical weakness, we propose a multiscale spatio-temporal memory network for fast video denoising, named MSTMN, aiming at striking an improved trade-off between cost and performance. To develop an efficient and effective algorithm for video denoising, we exploit a multiscale representation based on the Gaussian-Laplacian pyramid decomposition so that the reference frame can be restored in a coarse-to-fine manner. Guided by a model-based optimization approach, we design an effective variance estimation module, an alignment error estimation module and an adaptive fusion module for each scale of the pyramid representation. For the fusion module, we employ a reconstruction recurrence strategy to incorporate local temporal information. Moreover, we propose a memory enhancement module to exploit the global spatio-temporal information. Meanwhile, the similarity computation of the spatio-temporal memory network enables the proposed network to adaptively search the valuable information at the patch level, which avoids computationally expensive motion estimation and compensation operations. Experimental results on real-world raw video datasets have demonstrated that the proposed lightweight network outperforms current state-of-the-art fast video denoising algorithms such as FastDVDnet, EMVD, and ReMoNet with fewer computational costs.