Kai Guo, Seungwon Choi, Jongseong Choi, Lae-Hoon Kim
{"title":"用于视频去噪的包含多重融合的实用门控循环变压器网络","authors":"Kai Guo, Seungwon Choi, Jongseong Choi, Lae-Hoon Kim","doi":"arxiv-2409.06603","DOIUrl":null,"url":null,"abstract":"State-of-the-art (SOTA) video denoising methods employ multi-frame\nsimultaneous denoising mechanisms, resulting in significant delays (e.g., 16\nframes), making them impractical for real-time cameras. To overcome this\nlimitation, we propose a multi-fusion gated recurrent Transformer network\n(GRTN) that achieves SOTA denoising performance with only a single-frame delay.\nSpecifically, the spatial denoising module extracts features from the current\nframe, while the reset gate selects relevant information from the previous\nframe and fuses it with current frame features via the temporal denoising\nmodule. The update gate then further blends this result with the previous frame\nfeatures, and the reconstruction module integrates it with the current frame.\nTo robustly compute attention for noisy features, we propose a residual\nsimplified Swin Transformer with Euclidean distance (RSSTE) in the spatial and\ntemporal denoising modules. Comparative objective and subjective results show\nthat our GRTN achieves denoising performance comparable to SOTA multi-frame\ndelay networks, with only a single-frame delay.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"56 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Practical Gated Recurrent Transformer Network Incorporating Multiple Fusions for Video Denoising\",\"authors\":\"Kai Guo, Seungwon Choi, Jongseong Choi, Lae-Hoon Kim\",\"doi\":\"arxiv-2409.06603\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"State-of-the-art (SOTA) video denoising methods employ multi-frame\\nsimultaneous denoising mechanisms, resulting in significant delays (e.g., 16\\nframes), making them impractical for real-time cameras. To overcome this\\nlimitation, we propose a multi-fusion gated recurrent Transformer network\\n(GRTN) that achieves SOTA denoising performance with only a single-frame delay.\\nSpecifically, the spatial denoising module extracts features from the current\\nframe, while the reset gate selects relevant information from the previous\\nframe and fuses it with current frame features via the temporal denoising\\nmodule. The update gate then further blends this result with the previous frame\\nfeatures, and the reconstruction module integrates it with the current frame.\\nTo robustly compute attention for noisy features, we propose a residual\\nsimplified Swin Transformer with Euclidean distance (RSSTE) in the spatial and\\ntemporal denoising modules. Comparative objective and subjective results show\\nthat our GRTN achieves denoising performance comparable to SOTA multi-frame\\ndelay networks, with only a single-frame delay.\",\"PeriodicalId\":501289,\"journal\":{\"name\":\"arXiv - EE - Image and Video Processing\",\"volume\":\"56 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - EE - Image and Video Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.06603\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Image and Video Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06603","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
最先进的(SOTA)视频去噪方法采用了多帧同时去噪机制,导致显著的延迟(例如 16 帧),使其不适用于实时摄像机。为了克服这一限制,我们提出了一种多融合门控递归变换器网络(GRTN),它只需单帧延迟就能实现 SOTA 去噪性能。具体来说,空间去噪模块从当前帧中提取特征,而重置门则从先前帧中选择相关信息,并通过时间去噪模块将其与当前帧特征融合。为了稳健地计算噪声特征的关注度,我们在空间和时间去噪模块中提出了带欧氏距离的残差简化斯文变换器(RSSTE)。客观和主观的比较结果表明,我们的 GRTN 在仅有单帧延迟的情况下实现了与 SOTA 多帧延迟网络相当的去噪性能。
A Practical Gated Recurrent Transformer Network Incorporating Multiple Fusions for Video Denoising
State-of-the-art (SOTA) video denoising methods employ multi-frame
simultaneous denoising mechanisms, resulting in significant delays (e.g., 16
frames), making them impractical for real-time cameras. To overcome this
limitation, we propose a multi-fusion gated recurrent Transformer network
(GRTN) that achieves SOTA denoising performance with only a single-frame delay.
Specifically, the spatial denoising module extracts features from the current
frame, while the reset gate selects relevant information from the previous
frame and fuses it with current frame features via the temporal denoising
module. The update gate then further blends this result with the previous frame
features, and the reconstruction module integrates it with the current frame.
To robustly compute attention for noisy features, we propose a residual
simplified Swin Transformer with Euclidean distance (RSSTE) in the spatial and
temporal denoising modules. Comparative objective and subjective results show
that our GRTN achieves denoising performance comparable to SOTA multi-frame
delay networks, with only a single-frame delay.