{"title":"CCTFaceNet:利用级联CNN-transformer和双路特征融合增强人脸超分辨率","authors":"Naveen Kumar Tiwari , Shyam Singh Rajput , Raj Patel","doi":"10.1016/j.dsp.2025.105557","DOIUrl":null,"url":null,"abstract":"<div><div>The convolutional neural networks (CNNs) have significantly advanced face super-resolution techniques, enabling the restoration of degraded facial details. However, these methods often encounter limitations related to computational cost. Additionally, the limited receptive fields of CNNs can hinder the realistic and natural reconstruction of facial images. Transformer-based models counter this issue by global feature learning using multi-head self attention, but lack the incremental feature learning capabilities of CNNs. This paper proposes a Cascaded CNN-Transformer-based Face image super-resolution Network (CCTFaceNet) to deal with the above-mentioned issues. In the proposed network, to enrich input low-resolution face images, a Preliminary Super-Resolution, <em>i.e.</em>, PreSR network is placed at the beginning of CCTFaceNet. The output of PreSR is then fed to a deep feature extraction block consisting of a dual-path feature fusion block (DPFF). This block internally has two paths, one for CNN and the other for the cascaded attention transformer (CAT). DPFF also has a context-resolving unit responsible for filtering out redundant information. CAT consists of a shifted window multi-head self-attention, a multi scale edge attention, and a channel importance recalibration module; they are assembled in a cascaded manner. This assembly can reconstruct highly accurate details spatially and along the channel with crisp edges. A feed-forward layer is also sandwiched between the above attention cascade. The extracted deep features are upsampled using pixel-shuffle and sub-pixel convolutional layers. Extensive experiments conducted on several benchmark datasets affirm the supremacy of the proposed network.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105557"},"PeriodicalIF":3.0000,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CCTFaceNet: Enhancing face super-resolution with cascaded CNN-transformer and dual-path feature fusion\",\"authors\":\"Naveen Kumar Tiwari , Shyam Singh Rajput , Raj Patel\",\"doi\":\"10.1016/j.dsp.2025.105557\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The convolutional neural networks (CNNs) have significantly advanced face super-resolution techniques, enabling the restoration of degraded facial details. However, these methods often encounter limitations related to computational cost. Additionally, the limited receptive fields of CNNs can hinder the realistic and natural reconstruction of facial images. Transformer-based models counter this issue by global feature learning using multi-head self attention, but lack the incremental feature learning capabilities of CNNs. This paper proposes a Cascaded CNN-Transformer-based Face image super-resolution Network (CCTFaceNet) to deal with the above-mentioned issues. In the proposed network, to enrich input low-resolution face images, a Preliminary Super-Resolution, <em>i.e.</em>, PreSR network is placed at the beginning of CCTFaceNet. The output of PreSR is then fed to a deep feature extraction block consisting of a dual-path feature fusion block (DPFF). This block internally has two paths, one for CNN and the other for the cascaded attention transformer (CAT). DPFF also has a context-resolving unit responsible for filtering out redundant information. CAT consists of a shifted window multi-head self-attention, a multi scale edge attention, and a channel importance recalibration module; they are assembled in a cascaded manner. This assembly can reconstruct highly accurate details spatially and along the channel with crisp edges. A feed-forward layer is also sandwiched between the above attention cascade. The extracted deep features are upsampled using pixel-shuffle and sub-pixel convolutional layers. Extensive experiments conducted on several benchmark datasets affirm the supremacy of the proposed network.</div></div>\",\"PeriodicalId\":51011,\"journal\":{\"name\":\"Digital Signal Processing\",\"volume\":\"168 \",\"pages\":\"Article 105557\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Signal Processing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1051200425005792\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1051200425005792","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
CCTFaceNet: Enhancing face super-resolution with cascaded CNN-transformer and dual-path feature fusion
The convolutional neural networks (CNNs) have significantly advanced face super-resolution techniques, enabling the restoration of degraded facial details. However, these methods often encounter limitations related to computational cost. Additionally, the limited receptive fields of CNNs can hinder the realistic and natural reconstruction of facial images. Transformer-based models counter this issue by global feature learning using multi-head self attention, but lack the incremental feature learning capabilities of CNNs. This paper proposes a Cascaded CNN-Transformer-based Face image super-resolution Network (CCTFaceNet) to deal with the above-mentioned issues. In the proposed network, to enrich input low-resolution face images, a Preliminary Super-Resolution, i.e., PreSR network is placed at the beginning of CCTFaceNet. The output of PreSR is then fed to a deep feature extraction block consisting of a dual-path feature fusion block (DPFF). This block internally has two paths, one for CNN and the other for the cascaded attention transformer (CAT). DPFF also has a context-resolving unit responsible for filtering out redundant information. CAT consists of a shifted window multi-head self-attention, a multi scale edge attention, and a channel importance recalibration module; they are assembled in a cascaded manner. This assembly can reconstruct highly accurate details spatially and along the channel with crisp edges. A feed-forward layer is also sandwiched between the above attention cascade. The extracted deep features are upsampled using pixel-shuffle and sub-pixel convolutional layers. Extensive experiments conducted on several benchmark datasets affirm the supremacy of the proposed network.
期刊介绍:
Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal.
The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as:
• big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,