Squeeze-and-Excitation Self-Attention Mechanism Enhanced Digital Audio Source Recognition Based on Transfer Learning

IF 2 3区工程技术 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC

Circuits, Systems and Signal Processing Pub Date : 2024-09-13 DOI:10.1007/s00034-024-02850-8

Chunyan Zeng, Yuhao Zhao, Zhifeng Wang, Kun Li, Xiangkui Wan, Min Liu

{"title":"Squeeze-and-Excitation Self-Attention Mechanism Enhanced Digital Audio Source Recognition Based on Transfer Learning","authors":"Chunyan Zeng, Yuhao Zhao, Zhifeng Wang, Kun Li, Xiangkui Wan, Min Liu","doi":"10.1007/s00034-024-02850-8","DOIUrl":null,"url":null,"abstract":"<p>Recent advances in digital audio source recognition, particularly within judicial forensics and intellectual property rights domains, have been significantly propelled by deep learning technologies. As these methods evolve, they introduce novel models and enhance processing capabilities crucial for audio source recognition research. Despite these advancements, the limited availability of high-quality labeled samples and the labor-intensive nature of data labeling remain substantial challenges. This paper addresses these challenges by exploring the efficacy of self-attention mechanisms, specifically through a novel neural network that integrates the Squeeze-and-Excitation (SE) self-attention mechanism for identifying recording devices. Our study not only demonstrates a relative improvement of approximately 1.5% in all four evaluation metrics over traditional convolutional neural networks but also compares the performance across two public datasets. Furthermore, we delve into the self-attention mechanism’s adaptability across different network architectures by embedding the Squeeze-and-Excitation mechanism within both residual and conventional convolutional network frameworks. Through ablation studies and comparative analyses, we reveal that the impact of self-attention mechanisms varies significantly with the underlying network architecture. Additionally, employing a transfer learning strategy has allowed us to leverage data from a baseline network with extensive samples, applying it to a smaller dataset to successfully identify 141 devices. This approach resulted in performance enhancements ranging from 4% to 7% across various metrics, highlighting the transfer learning method’s role in advancing digital audio source identification research. These findings not only validate the Squeeze-and-Excitation self-attention mechanism’s effectiveness in audio source recognition but also illustrate the broader applicability and benefits of incorporating advanced learning strategies in overcoming data scarcity and enhancing model adaptability.</p>","PeriodicalId":10227,"journal":{"name":"Circuits, Systems and Signal Processing","volume":"94 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Circuits, Systems and Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s00034-024-02850-8","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Recent advances in digital audio source recognition, particularly within judicial forensics and intellectual property rights domains, have been significantly propelled by deep learning technologies. As these methods evolve, they introduce novel models and enhance processing capabilities crucial for audio source recognition research. Despite these advancements, the limited availability of high-quality labeled samples and the labor-intensive nature of data labeling remain substantial challenges. This paper addresses these challenges by exploring the efficacy of self-attention mechanisms, specifically through a novel neural network that integrates the Squeeze-and-Excitation (SE) self-attention mechanism for identifying recording devices. Our study not only demonstrates a relative improvement of approximately 1.5% in all four evaluation metrics over traditional convolutional neural networks but also compares the performance across two public datasets. Furthermore, we delve into the self-attention mechanism’s adaptability across different network architectures by embedding the Squeeze-and-Excitation mechanism within both residual and conventional convolutional network frameworks. Through ablation studies and comparative analyses, we reveal that the impact of self-attention mechanisms varies significantly with the underlying network architecture. Additionally, employing a transfer learning strategy has allowed us to leverage data from a baseline network with extensive samples, applying it to a smaller dataset to successfully identify 141 devices. This approach resulted in performance enhancements ranging from 4% to 7% across various metrics, highlighting the transfer learning method’s role in advancing digital audio source identification research. These findings not only validate the Squeeze-and-Excitation self-attention mechanism’s effectiveness in audio source recognition but also illustrate the broader applicability and benefits of incorporating advanced learning strategies in overcoming data scarcity and enhancing model adaptability.

Abstract Image

查看原文本刊更多论文

基于迁移学习的挤压-激发自注意机制增强数字音源识别能力

深度学习技术极大地推动了数字音源识别领域，特别是司法取证和知识产权领域的最新进展。随着这些方法的发展，它们引入了新的模型，并增强了对音源识别研究至关重要的处理能力。尽管取得了这些进步，但高质量标注样本的有限可用性和数据标注的劳动密集型仍是巨大挑战。本文通过探索自我注意机制的功效来应对这些挑战，特别是通过一种集成了挤压-激发（SE）自我注意机制的新型神经网络来识别录音设备。与传统卷积神经网络相比，我们的研究不仅在所有四个评估指标上都实现了约 1.5% 的相对改进，而且还比较了两个公共数据集的性能。此外，我们还通过在残差和传统卷积网络框架中嵌入 "挤压-激发 "机制，深入研究了自我关注机制在不同网络架构中的适应性。通过消融研究和比较分析，我们发现自我注意机制的影响随底层网络架构的不同而有显著差异。此外，采用迁移学习策略使我们能够利用具有大量样本的基线网络数据，将其应用于较小的数据集，从而成功识别出 141 个设备。这种方法使各种指标的性能提高了 4% 到 7%，突出了迁移学习方法在推动数字音源识别研究方面的作用。这些研究结果不仅验证了 "挤压-激发 "自我注意机制在音源识别中的有效性，而且还说明了在克服数据稀缺和增强模型适应性方面采用高级学习策略的广泛适用性和益处。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Circuits, Systems and Signal Processing 工程技术-工程：电子与电气

CiteScore

4.80

自引率

13.00%

发文量

321

审稿时长

4.6 months

期刊介绍： Rapid developments in the analog and digital processing of signals for communication, control, and computer systems have made the theory of electrical circuits and signal processing a burgeoning area of research and design. The aim of Circuits, Systems, and Signal Processing (CSSP) is to help meet the needs of outlets for significant research papers and state-of-the-art review articles in the area. The scope of the journal is broad, ranging from mathematical foundations to practical engineering design. It encompasses, but is not limited to, such topics as linear and nonlinear networks, distributed circuits and systems, multi-dimensional signals and systems, analog filters and signal processing, digital filters and signal processing, statistical signal processing, multimedia, computer aided design, graph theory, neural systems, communication circuits and systems, and VLSI signal processing. The Editorial Board is international, and papers are welcome from throughout the world. The journal is devoted primarily to research papers, but survey, expository, and tutorial papers are also published. Circuits, Systems, and Signal Processing (CSSP) is published twelve times annually.