An interpretable convolutional neural network via generalized time–frequency scattering

IF 3.4 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing Pub Date : 2025-04-28 DOI:10.1016/j.sigpro.2025.110043

Xiaoping Liu , Gong Chen , Jun Shi , Ran Tao

{"title":"An interpretable convolutional neural network via generalized time–frequency scattering","authors":"Xiaoping Liu , Gong Chen , Jun Shi , Ran Tao","doi":"10.1016/j.sigpro.2025.110043","DOIUrl":null,"url":null,"abstract":"<div><div>Convolutional neural networks (CNNs) have recently demonstrated impressive performance in complex machine learning tasks. However, the CNN requires a large quantity of annotated data to converge to a good solution, and the theoretical understanding of this network is still in its infancy. Towards this end, a variant of the CNN, dubbed the deep scattering network (DSN), has been proposed by employing the linear time–frequency transform. The DSN inherits the hierarchical structure of the CNN, but chooses predefined wavelet/Gabor filters as its convolutional kernels instead of data-driven linear filters. Unfortunately, the DSN suffers from a major drawback that it is suitable for stationary image textures but not for non-stationary image textures, since wavelet/Gabor filters are intrinsically linear translation-invariant filters. The aim of this paper is to overcome this deficiency based upon a generalized linear time–frequency transform–the short-time fractional Fourier transform (STFRFT) which can be interpreted as a bank of linear translation-variant filters and thus may be well suitable for non-stationary texture analysis. We first introduce a generalized time–frequency scattering transform using the STFRFT. By applying the derived result, we propose an interpretable CNN by cascading the STFRFTs and modulus operators. Moreover, several basic properties of the proposed interpretable CNN are derived, and an efficient implementation of this network is also presented. Finally, the applications of the derived results are discussed.</div></div>","PeriodicalId":49523,"journal":{"name":"Signal Processing","volume":"237 ","pages":"Article 110043"},"PeriodicalIF":3.4000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0165168425001574","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Convolutional neural networks (CNNs) have recently demonstrated impressive performance in complex machine learning tasks. However, the CNN requires a large quantity of annotated data to converge to a good solution, and the theoretical understanding of this network is still in its infancy. Towards this end, a variant of the CNN, dubbed the deep scattering network (DSN), has been proposed by employing the linear time–frequency transform. The DSN inherits the hierarchical structure of the CNN, but chooses predefined wavelet/Gabor filters as its convolutional kernels instead of data-driven linear filters. Unfortunately, the DSN suffers from a major drawback that it is suitable for stationary image textures but not for non-stationary image textures, since wavelet/Gabor filters are intrinsically linear translation-invariant filters. The aim of this paper is to overcome this deficiency based upon a generalized linear time–frequency transform–the short-time fractional Fourier transform (STFRFT) which can be interpreted as a bank of linear translation-variant filters and thus may be well suitable for non-stationary texture analysis. We first introduce a generalized time–frequency scattering transform using the STFRFT. By applying the derived result, we propose an interpretable CNN by cascading the STFRFTs and modulus operators. Moreover, several basic properties of the proposed interpretable CNN are derived, and an efficient implementation of this network is also presented. Finally, the applications of the derived results are discussed.

查看原文本刊更多论文

基于广义时频散射的可解释卷积神经网络

卷积神经网络（cnn）最近在复杂的机器学习任务中展示了令人印象深刻的表现。然而，CNN需要大量带注释的数据才能收敛到一个好的解，对这种网络的理论认识还处于起步阶段。为此，利用线性时频变换，提出了CNN的一种变体，称为深散射网络（DSN）。DSN继承了CNN的分层结构，但选择了预定义的小波/Gabor滤波器作为卷积核，而不是数据驱动的线性滤波器。不幸的是，DSN有一个主要缺点，即它适用于平稳图像纹理，但不适用于非平稳图像纹理，因为小波/Gabor滤波器本质上是线性平移不变滤波器。本文的目的是基于广义线性时频变换-短时分数傅里叶变换（STFRFT）来克服这一缺陷，该变换可以解释为一组线性平移变滤波器，因此可能非常适合于非平稳纹理分析。我们首先使用STFRFT引入广义时频散射变换。通过应用导出的结果，我们提出了一个由stfrft和模算子级联的可解释CNN。此外，本文还推导了可解释CNN的几个基本性质，并给出了该网络的有效实现。最后，讨论了所得结果的应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Signal Processing 工程技术-工程：电子与电气

CiteScore

9.20

自引率

9.10%

发文量

309

审稿时长

41 days

期刊介绍： Signal Processing incorporates all aspects of the theory and practice of signal processing. It features original research work, tutorial and review articles, and accounts of practical developments. It is intended for a rapid dissemination of knowledge and experience to engineers and scientists working in the research, development or practical application of signal processing. Subject areas covered by the journal include: Signal Theory; Stochastic Processes; Detection and Estimation; Spectral Analysis; Filtering; Signal Processing Systems; Software Developments; Image Processing; Pattern Recognition; Optical Signal Processing; Digital Signal Processing; Multi-dimensional Signal Processing; Communication Signal Processing; Biomedical Signal Processing; Geophysical and Astrophysical Signal Processing; Earth Resources Signal Processing; Acoustic and Vibration Signal Processing; Data Processing; Remote Sensing; Signal Processing Technology; Radar Signal Processing; Sonar Signal Processing; Industrial Applications; New Applications.