CCTFaceNet：利用级联CNN-transformer和双路特征融合增强人脸超分辨率

IF 3 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Digital Signal Processing Pub Date : 2025-08-26 DOI:10.1016/j.dsp.2025.105557

Naveen Kumar Tiwari , Shyam Singh Rajput , Raj Patel

{"title":"CCTFaceNet：利用级联CNN-transformer和双路特征融合增强人脸超分辨率","authors":"Naveen Kumar Tiwari , Shyam Singh Rajput , Raj Patel","doi":"10.1016/j.dsp.2025.105557","DOIUrl":null,"url":null,"abstract":"<div><div>The convolutional neural networks (CNNs) have significantly advanced face super-resolution techniques, enabling the restoration of degraded facial details. However, these methods often encounter limitations related to computational cost. Additionally, the limited receptive fields of CNNs can hinder the realistic and natural reconstruction of facial images. Transformer-based models counter this issue by global feature learning using multi-head self attention, but lack the incremental feature learning capabilities of CNNs. This paper proposes a Cascaded CNN-Transformer-based Face image super-resolution Network (CCTFaceNet) to deal with the above-mentioned issues. In the proposed network, to enrich input low-resolution face images, a Preliminary Super-Resolution, <em>i.e.</em>, PreSR network is placed at the beginning of CCTFaceNet. The output of PreSR is then fed to a deep feature extraction block consisting of a dual-path feature fusion block (DPFF). This block internally has two paths, one for CNN and the other for the cascaded attention transformer (CAT). DPFF also has a context-resolving unit responsible for filtering out redundant information. CAT consists of a shifted window multi-head self-attention, a multi scale edge attention, and a channel importance recalibration module; they are assembled in a cascaded manner. This assembly can reconstruct highly accurate details spatially and along the channel with crisp edges. A feed-forward layer is also sandwiched between the above attention cascade. The extracted deep features are upsampled using pixel-shuffle and sub-pixel convolutional layers. Extensive experiments conducted on several benchmark datasets affirm the supremacy of the proposed network.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105557"},"PeriodicalIF":3.0000,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CCTFaceNet: Enhancing face super-resolution with cascaded CNN-transformer and dual-path feature fusion\",\"authors\":\"Naveen Kumar Tiwari , Shyam Singh Rajput , Raj Patel\",\"doi\":\"10.1016/j.dsp.2025.105557\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The convolutional neural networks (CNNs) have significantly advanced face super-resolution techniques, enabling the restoration of degraded facial details. However, these methods often encounter limitations related to computational cost. Additionally, the limited receptive fields of CNNs can hinder the realistic and natural reconstruction of facial images. Transformer-based models counter this issue by global feature learning using multi-head self attention, but lack the incremental feature learning capabilities of CNNs. This paper proposes a Cascaded CNN-Transformer-based Face image super-resolution Network (CCTFaceNet) to deal with the above-mentioned issues. In the proposed network, to enrich input low-resolution face images, a Preliminary Super-Resolution, <em>i.e.</em>, PreSR network is placed at the beginning of CCTFaceNet. The output of PreSR is then fed to a deep feature extraction block consisting of a dual-path feature fusion block (DPFF). This block internally has two paths, one for CNN and the other for the cascaded attention transformer (CAT). DPFF also has a context-resolving unit responsible for filtering out redundant information. CAT consists of a shifted window multi-head self-attention, a multi scale edge attention, and a channel importance recalibration module; they are assembled in a cascaded manner. This assembly can reconstruct highly accurate details spatially and along the channel with crisp edges. A feed-forward layer is also sandwiched between the above attention cascade. The extracted deep features are upsampled using pixel-shuffle and sub-pixel convolutional layers. Extensive experiments conducted on several benchmark datasets affirm the supremacy of the proposed network.</div></div>\",\"PeriodicalId\":51011,\"journal\":{\"name\":\"Digital Signal Processing\",\"volume\":\"168 \",\"pages\":\"Article 105557\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Signal Processing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1051200425005792\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1051200425005792","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

卷积神经网络（cnn）具有非常先进的面部超分辨率技术，可以恢复退化的面部细节。然而，这些方法经常遇到与计算成本相关的限制。此外，cnn有限的接受域会阻碍面部图像的真实和自然重建。基于变压器的模型通过使用多头自关注的全局特征学习来解决这个问题，但缺乏cnn的增量特征学习能力。本文提出了一种基于cnn - transformer的级联人脸图像超分辨率网络（CCTFaceNet）来解决上述问题。在本文提出的网络中，为了丰富输入的低分辨率人脸图像，在CCTFaceNet的开头放置了一个初步的超分辨率网络，即PreSR网络。然后将PreSR的输出送入由双路特征融合块（DPFF）组成的深度特征提取块。这个块内部有两条路径，一条用于CNN，另一条用于级联注意力转换器（CAT）。DPFF还有一个上下文解析单元，负责过滤掉冗余信息。CAT由移位窗口多头自注意、多尺度边缘注意和信道重要性再校准模块组成；它们以级联的方式组装起来。该组件可以在空间上和沿通道重建高度精确的细节，并具有清晰的边缘。前馈层也夹在上述注意级联之间。使用像素洗牌和亚像素卷积层对提取的深度特征进行上采样。在几个基准数据集上进行的大量实验证实了所提出的网络的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CCTFaceNet: Enhancing face super-resolution with cascaded CNN-transformer and dual-path feature fusion

The convolutional neural networks (CNNs) have significantly advanced face super-resolution techniques, enabling the restoration of degraded facial details. However, these methods often encounter limitations related to computational cost. Additionally, the limited receptive fields of CNNs can hinder the realistic and natural reconstruction of facial images. Transformer-based models counter this issue by global feature learning using multi-head self attention, but lack the incremental feature learning capabilities of CNNs. This paper proposes a Cascaded CNN-Transformer-based Face image super-resolution Network (CCTFaceNet) to deal with the above-mentioned issues. In the proposed network, to enrich input low-resolution face images, a Preliminary Super-Resolution, i.e., PreSR network is placed at the beginning of CCTFaceNet. The output of PreSR is then fed to a deep feature extraction block consisting of a dual-path feature fusion block (DPFF). This block internally has two paths, one for CNN and the other for the cascaded attention transformer (CAT). DPFF also has a context-resolving unit responsible for filtering out redundant information. CAT consists of a shifted window multi-head self-attention, a multi scale edge attention, and a channel importance recalibration module; they are assembled in a cascaded manner. This assembly can reconstruct highly accurate details spatially and along the channel with crisp edges. A feed-forward layer is also sandwiched between the above attention cascade. The extracted deep features are upsampled using pixel-shuffle and sub-pixel convolutional layers. Extensive experiments conducted on several benchmark datasets affirm the supremacy of the proposed network.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Digital Signal Processing 工程技术-工程：电子与电气

CiteScore

5.30

自引率

17.20%

发文量

435

审稿时长

66 days

期刊介绍： Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal. The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as: • big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,