{"title":"TBag:构建用于实时SISR的轻量级混合网络的三种方法","authors":"Ruoyi Xue;Cheng Cheng;Hang Wang;Hongbin Sun","doi":"10.1109/TMM.2025.3542966","DOIUrl":null,"url":null,"abstract":"The prevalent convolution neural network (CNN) and Transformer have revolutionized the area of single-image super-resolution (SISR). Though these models have significantly improved performance, they often struggle with real-time applications or on resource-constrained platforms due to their complexity. In this paper, we propose TBag, a lightweight hybrid network that combines the strengths of CNN and Transformer to address these challenges. Our method simplifies the Transformer block with three key optimizations: 1) No projection layer is applied to the value in the original self-attention operation; 2) The number of tokens is rescaled before the self-attention operation and then rescaled back for easing of computation; 3) The expansion factor of the original feed-forward network (FFN) is adjusted. These optimizations enable the development of an efficient hybrid network tailored for real-time SISR. Notably, the hybrid design of CNN and Transformer further enhances both local detail recovery and global feature modeling. Extensive experiments show that TBag achieves a competitive trade-off between effectiveness and efficiency compared to previous lightweight SISR methods (e.g., <bold>+0.42 dB</b> PSNR with an <bold>86.7%</b> reduction in latency). Moreover, TBag's real-time capabilities make it highly suitable for practical applications, with the TBag-Tiny version achieving up to <bold>59 FPS</b> on hardware devices. Future work will explore the potential of this hybrid approach in other image restoration tasks, such as denoising and deblurring.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"5363-5375"},"PeriodicalIF":9.7000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TBag: Three Recipes for Building up a Lightweight Hybrid Network for Real-Time SISR\",\"authors\":\"Ruoyi Xue;Cheng Cheng;Hang Wang;Hongbin Sun\",\"doi\":\"10.1109/TMM.2025.3542966\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The prevalent convolution neural network (CNN) and Transformer have revolutionized the area of single-image super-resolution (SISR). Though these models have significantly improved performance, they often struggle with real-time applications or on resource-constrained platforms due to their complexity. In this paper, we propose TBag, a lightweight hybrid network that combines the strengths of CNN and Transformer to address these challenges. Our method simplifies the Transformer block with three key optimizations: 1) No projection layer is applied to the value in the original self-attention operation; 2) The number of tokens is rescaled before the self-attention operation and then rescaled back for easing of computation; 3) The expansion factor of the original feed-forward network (FFN) is adjusted. These optimizations enable the development of an efficient hybrid network tailored for real-time SISR. Notably, the hybrid design of CNN and Transformer further enhances both local detail recovery and global feature modeling. Extensive experiments show that TBag achieves a competitive trade-off between effectiveness and efficiency compared to previous lightweight SISR methods (e.g., <bold>+0.42 dB</b> PSNR with an <bold>86.7%</b> reduction in latency). Moreover, TBag's real-time capabilities make it highly suitable for practical applications, with the TBag-Tiny version achieving up to <bold>59 FPS</b> on hardware devices. Future work will explore the potential of this hybrid approach in other image restoration tasks, such as denoising and deblurring.\",\"PeriodicalId\":13273,\"journal\":{\"name\":\"IEEE Transactions on Multimedia\",\"volume\":\"27 \",\"pages\":\"5363-5375\"},\"PeriodicalIF\":9.7000,\"publicationDate\":\"2025-02-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multimedia\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10891608/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10891608/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
TBag: Three Recipes for Building up a Lightweight Hybrid Network for Real-Time SISR
The prevalent convolution neural network (CNN) and Transformer have revolutionized the area of single-image super-resolution (SISR). Though these models have significantly improved performance, they often struggle with real-time applications or on resource-constrained platforms due to their complexity. In this paper, we propose TBag, a lightweight hybrid network that combines the strengths of CNN and Transformer to address these challenges. Our method simplifies the Transformer block with three key optimizations: 1) No projection layer is applied to the value in the original self-attention operation; 2) The number of tokens is rescaled before the self-attention operation and then rescaled back for easing of computation; 3) The expansion factor of the original feed-forward network (FFN) is adjusted. These optimizations enable the development of an efficient hybrid network tailored for real-time SISR. Notably, the hybrid design of CNN and Transformer further enhances both local detail recovery and global feature modeling. Extensive experiments show that TBag achieves a competitive trade-off between effectiveness and efficiency compared to previous lightweight SISR methods (e.g., +0.42 dB PSNR with an 86.7% reduction in latency). Moreover, TBag's real-time capabilities make it highly suitable for practical applications, with the TBag-Tiny version achieving up to 59 FPS on hardware devices. Future work will explore the potential of this hybrid approach in other image restoration tasks, such as denoising and deblurring.
期刊介绍:
The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.