Chunyan Zeng, Yuhao Zhao, Zhifeng Wang, Kun Li, Xiangkui Wan, Min Liu
{"title":"Squeeze-and-Excitation Self-Attention Mechanism Enhanced Digital Audio Source Recognition Based on Transfer Learning","authors":"Chunyan Zeng, Yuhao Zhao, Zhifeng Wang, Kun Li, Xiangkui Wan, Min Liu","doi":"10.1007/s00034-024-02850-8","DOIUrl":"https://doi.org/10.1007/s00034-024-02850-8","url":null,"abstract":"<p>Recent advances in digital audio source recognition, particularly within judicial forensics and intellectual property rights domains, have been significantly propelled by deep learning technologies. As these methods evolve, they introduce novel models and enhance processing capabilities crucial for audio source recognition research. Despite these advancements, the limited availability of high-quality labeled samples and the labor-intensive nature of data labeling remain substantial challenges. This paper addresses these challenges by exploring the efficacy of self-attention mechanisms, specifically through a novel neural network that integrates the Squeeze-and-Excitation (SE) self-attention mechanism for identifying recording devices. Our study not only demonstrates a relative improvement of approximately 1.5% in all four evaluation metrics over traditional convolutional neural networks but also compares the performance across two public datasets. Furthermore, we delve into the self-attention mechanism’s adaptability across different network architectures by embedding the Squeeze-and-Excitation mechanism within both residual and conventional convolutional network frameworks. Through ablation studies and comparative analyses, we reveal that the impact of self-attention mechanisms varies significantly with the underlying network architecture. Additionally, employing a transfer learning strategy has allowed us to leverage data from a baseline network with extensive samples, applying it to a smaller dataset to successfully identify 141 devices. This approach resulted in performance enhancements ranging from 4% to 7% across various metrics, highlighting the transfer learning method’s role in advancing digital audio source identification research. These findings not only validate the Squeeze-and-Excitation self-attention mechanism’s effectiveness in audio source recognition but also illustrate the broader applicability and benefits of incorporating advanced learning strategies in overcoming data scarcity and enhancing model adaptability.</p>","PeriodicalId":10227,"journal":{"name":"Circuits, Systems and Signal Processing","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142212237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discrete-Time Delta-Sigma Modulator with Successively Approximating Register ADC Assisted Analog Feedback Technique","authors":"Hsin-Liang Chen, Hsiao-Hsing Chou, Hong-Ming Chiu, Hung-Chi Chang, Jen-Shiun Chiang","doi":"10.1007/s00034-024-02832-w","DOIUrl":"https://doi.org/10.1007/s00034-024-02832-w","url":null,"abstract":"<p>This paper proposes a delta-sigma modulator (DSM) for audio band applications with low-area cost and high-resolution performance characteristics. The proposed circuit is implemented by discrete-time switched capacitor circuits. It employs an assisted 6-bit successive approximation register (SAR) analog-to-digital converter (ADC) as the quantizer. Most importantly, it combines and shares the resistive digital-to-analog (DAC) in DSM and SAR ADC. Therefore, it can achieve high-efficiency advantages and reduce the chip layout cost. After all, the chip area is only 0.096 mm<sup>2</sup> by the 0.18 um 1P6M CMOS process. It achieves 96 dB dynamic range (DR), 83.1 dB signal to noise and distortion ratio (SNDR), and 93.4 dB signal to noise ratio (SNR) with 25 kHz signal bandwidth and oversampling ratio (OSR) of 64.</p>","PeriodicalId":10227,"journal":{"name":"Circuits, Systems and Signal Processing","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142212239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recursive Windowed Variational Mode Decomposition","authors":"Zhaoheng Zhou, Bingo Wing-Kuen Ling, Nuo Xu","doi":"10.1007/s00034-024-02864-2","DOIUrl":"https://doi.org/10.1007/s00034-024-02864-2","url":null,"abstract":"<p>The variational mode decomposition (VMD) and its variants aim to decompose a given signal into a set of narrow band modes. The analysis of these modes is usually based on the Fourier analysis. That is, the center frequencies of these modes are found without exploiting the local time varying information of the signal during the iteration in the existing algorithms for performing the VMD. To address this issue, this paper proposes a recursive windowed VMD (RWVMD) approach for performing the signal decomposition. First, the window is sliding across the signal. Then, the variational mode extraction is performed on each frame to obtain the first mode. Then, the difference between the first mode and the signal is computed to obtain the residual signal. The above process is repeated on the residual signal until the algorithm converges. The effectiveness of the RWVMD algorithm is demonstrated through the computer numerical simulations. It is found that the center frequency in the time frequency plane is more accurately matched with the characteristics of the original signal.</p>","PeriodicalId":10227,"journal":{"name":"Circuits, Systems and Signal Processing","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142212238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Event-Triggered $$H_{infty }$$ Filtering for A Class of Nonlinear Systems Under DoS Attacks","authors":"Weiguo Ma, Yuanqiang Zhou, Xin Lai, Furong Gao","doi":"10.1007/s00034-024-02775-2","DOIUrl":"https://doi.org/10.1007/s00034-024-02775-2","url":null,"abstract":"<p>This paper investigates event-triggered <span>(H_{infty })</span> filtering for a class of discrete-time nonlinear systems subject to denial-of-service (DoS) attacks. Since the communication network in the networked systems is vulnerable to malicious cyber-attacks, this paper models DoS attacks as a Bernoulli random variable, which results in stochastic filtering error system. Besides, we use adaptive event-triggered communication to ensure that the least amount of information is transmitted over the network. For the filtering error system under the effect of event-triggered communication and DoS attacks, we provide sufficient conditions on guaranteeing the stability and prescribed <span>(H_{infty })</span> performance, where the <span>(H_{infty })</span> filter and event-triggered parameters are co-designed using the linear matrix inequality approach. Finally, two illustrative examples are provided to demonstrate the effectiveness of the proposed method.</p>","PeriodicalId":10227,"journal":{"name":"Circuits, Systems and Signal Processing","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142212241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Individually Weighted Modified Logarithmic Hyperbolic Sine Curvelet Based Recursive FLN for Nonlinear System Identification","authors":"Neetu Chikyal, Vasundhara, Chayan Bhar, Asutosh Kar, Mads Graesboll Christensen","doi":"10.1007/s00034-024-02839-3","DOIUrl":"https://doi.org/10.1007/s00034-024-02839-3","url":null,"abstract":"<p>Lately, an adaptive exponential functional link network (AEFLN) involving exponential terms integrated with trigonometric functional expansion is being introduced as a linear-in-the-parameters nonlinear filter. However, they exhibit degraded efficacy in lieu of non-Gaussian or impulsive noise interference. Therefore, to enhance the nonlinear modelling capability, here is a modified logarithmic hyperbolic sine cost function in amalgamation with the adaptive recursive exponential functional link network. In conjugation with this, a sparsity constraint motivated by a curvelet-dependent notion is employed in the suggested approach. Therefore, this paper presents an individually weighted modified logarithmic hyperbolic sine curvelet-based recursive exponential FLN (IMLSC-REF) for robust sparse nonlinear system identification. An individually weighted adaptation gain is imparted to several coefficients corresponding to the nonlinear adaptive model for accelerating the convergence rate. The weight update rule and the maximum criteria for the convergence factor are being further derived. Exhaustive simulation studies profess the effectiveness of the introduced algorithm in case of varied nonlinearity and for identifying as well as modelling the physical path of the acoustic feedback phenomenon of a behind-the-ear (BTE) hearing aid.</p>","PeriodicalId":10227,"journal":{"name":"Circuits, Systems and Signal Processing","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142212240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Secure and Imperceptible Frequency-Based Watermarking for Medical Images","authors":"Saadaoui Naima, Akram Zine Eddine Boukhamla, Zermi Narima, Khaldi Amine, Kafi Med Redouane, Aditya Kumar Sahu","doi":"10.1007/s00034-024-02814-y","DOIUrl":"https://doi.org/10.1007/s00034-024-02814-y","url":null,"abstract":"<p>Medical image security is a critical concern in the healthcare domain, and various watermarking techniques have been explored to embed imperceptible and secure data within medical images. This paper introduces an innovative frequency-based watermarking technique for medical images, utilizing the Fractional Discrete Cosine Transform (FDCT) and Schur decomposition to ensure robust and secure watermark embedding. The watermark bits are integrated by modulating the obtained Schur coefficients, thereby ensuring robust and secure watermarking without significantly altering the visual quality of the medical images. The experiments conducted on the ocular database demonstrate the capacity, imperceptibility, and robustness of the proposed method. This approach achieved a favorable trade-off between imperceptibility and information embedding capacity for ensuring the authenticity and integrity of medical images during transmission.</p>","PeriodicalId":10227,"journal":{"name":"Circuits, Systems and Signal Processing","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142212245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Online Two-Stage Classification Based on Projections","authors":"Aimin Song, Yan Wang, Shengyang Luan","doi":"10.1007/s00034-024-02843-7","DOIUrl":"https://doi.org/10.1007/s00034-024-02843-7","url":null,"abstract":"<p>Kernel-based online classification algorithms, such as the Perceptron, NORMA, and passive-aggressive, are renowned for their computational efficiency but have been criticized for slow convergence. However, the parallel projection algorithm, within the adaptive projected subgradient method framework, exhibits accelerated convergence and enhanced noise resilience. Despite these advantages, a specific sparsification procedure for the parallel projection algorithm is currently absent. Additionally, existing online classification algorithms, including those mentioned earlier, heavily rely on the kernel width parameter, rendering them sensitive to its choices. In an effort to bolster the performance of these algorithms, we propose a two-stage classification algorithm within the Cartesian product space of reproducing kernel Hilbert spaces. In the initial stage, we introduce an online double-kernel classifier with parallel projection. This design aims not only to improve convergence but also to address the sensitivity to kernel width. In the subsequent stage, the component with a larger kernel width remains fixed, while the component with a smaller kernel width undergoes updates. To promote sparsity and mitigate model complexity, we incorporate the projection-along-subspace technique. Moreover, for enhanced computational efficiency, we integrate the set-membership technique into the updates, selectively exploiting informative vectors to improve the classifier. The monotone approximation of the proposed classifier, based on the designed <span>( epsilon )</span>-insensitive function, is presented. Finally, we apply the proposed algorithm to equalize a nonlinear channel. Simulation results demonstrate that the proposed classifier achieves faster convergence and lower misclassification error with comparable model complexity.</p>","PeriodicalId":10227,"journal":{"name":"Circuits, Systems and Signal Processing","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142212242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SWAM-Net $$+$$ : Selective Wavelet Attentive M-Network $$+$$ for Single Image Dehazing","authors":"Raju Nuthi, Srinivas Kankanala","doi":"10.1007/s00034-024-02837-5","DOIUrl":"https://doi.org/10.1007/s00034-024-02837-5","url":null,"abstract":"<p>Image dehazing is an ill-posed issue in low-level computer vision; therefore, it grabbed many researchers’ attention. The key mechanism to improve dehazing performance remains unclear, although many existing network pipelines work fine. To improve the performance of the image dehazing network, a hierarchical model named “Selective Attentive Wavelet M-Net+” (SWAM-Net+) was proposed. In order to enrich the features from the wavelet domain, a “Selective Wavelet Attentive Module” was introduced in M-Net+. Several key components of our network are used for extracting the multiscale features through parallel multi-resolution convolution channels. Contextual information is collected using a dual attention unit, and the attention is based on multiscale feature aggregation. We replaced summation and concatenation operations by introducing the Selective Kernel Feature Fusing module to achieve feature aggregation. Furthermore, our network achieves comprehensively better performance results on the RESIDE dataset both qualitatively and quantitatively.</p>","PeriodicalId":10227,"journal":{"name":"Circuits, Systems and Signal Processing","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142212244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Flat Broadband Passive Circuit with Negative Group Delay","authors":"Aixia Yuan, Niannan Chang, Junzheng Liu, Yuwei Meng","doi":"10.1007/s00034-024-02844-6","DOIUrl":"https://doi.org/10.1007/s00034-024-02844-6","url":null,"abstract":"<p>A novel flat negative group delay circuit is proposed. The flatter negative group delay is achieved, and the insertion loss is reduced. The circuit structure consists of a resistor <i>R</i>1 series connected with a capacitor <i>C</i>1 and inductor <i>L</i>1 in parallel, followed by a capacitor <i>C</i>2 and inductor <i>L</i>2 in parallel, and finally connected in series with a capacitor <i>C</i>3 and inductor <i>L</i>3. The analysis design equation is provided. The effects of different component values on circuit flatness and bandwidth are analyzed. According to this design method, a flat negative group delay circuit is designed and fabricated. The simulation and measurement results are basically consistent. It has good flat negative group delay characteristics, with a group delay fluctuation of 3%, a group delay value of −1.02 ns, and an insertion loss of 7 dB. The feasibility of the design method is verified.</p>","PeriodicalId":10227,"journal":{"name":"Circuits, Systems and Signal Processing","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142212259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chandrasekhar Paseddula, Suryakanth V. Gangashetty
{"title":"Acoustic Scene Classification Using Various Features and DNN Model: A Monolithic and Hierarchical Approach","authors":"Chandrasekhar Paseddula, Suryakanth V. Gangashetty","doi":"10.1007/s00034-024-02836-6","DOIUrl":"https://doi.org/10.1007/s00034-024-02836-6","url":null,"abstract":"<p>An acoustic scene is a complicated phenomenon; thus, it would be difficult to draw out scene-specific information from the foreground and background sound sources. To accurately discern the sound sceneries and pinpoint the distinct sound occurrences in realistic soundscapes, more study is still required. Investigating a good feature representation is helpful for acoustic scene classification (ASC). This study investigated a few common acoustic features for ASC, including the mel-frequency cepstral coefficients (MFCC), log-mel band energy (LOGMEL), linear prediction cepstral coefficients (LPCC), and all-pole group delay (APGD). To represent acoustic scenes, we proposed a variety of features based on speaker/music recognition, including inverted mel-frequency cepstral coefficients, spectral centroid magnitude coefficients, sub-band spectral flux coefficients, and single frequency filtering cepstral coefficients. Using DNN classification models, it has been investigated how these features affect the classification of acoustic scenes in the DCASE 2017 dataset. Our analysis shows that no single feature has performed better than the others for all acoustic scenarios. In general, it may be challenging for a single classifier to successfully identify all the classes when there are more acoustic scenes. Therefore, we have proposed a two-level hierarchical classification approach. This is accomplished by first determining the meta-category of the acoustic scene, followed by the fine-grained classification that falls under each meta-category. From our studies, it is observed that, the hierarchical approach has performed (81.0%) better than the monolithic classification approach (79.9%) without DNN score fusion at level 2 as post processing. The performance of the ASC system can be further improved by exploring more sophisticated complementary features. The fusion of MFCC AND LOGMEL features based monolithic system resulted in an accuracy of 90.5%. The proposed hierarchical system results in accuracy of 82.6% with DNN score fusion at level 2 as post processing.</p>","PeriodicalId":10227,"journal":{"name":"Circuits, Systems and Signal Processing","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142212256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}