{"title":"SMDRL: Self-supervised mobile device representation learning framework for recording source identification from unlabeled data","authors":"Chunyan Zeng , Yuhao Zhao , Zhifeng Wang","doi":"10.1016/j.eswa.2025.127635","DOIUrl":null,"url":null,"abstract":"<div><div>In mobile recording device source identification, deep learning techniques have been pivotal for extracting deep features from audio signals. Traditional approaches, however, predominantly rely on fully labeled datasets for supervised training, neglecting the vast amounts of unlabeled data typically present in real-world scenarios. This limitation has led to a significant disparity between the performance of existing models and their practical applicability. To bridge this gap, we introduce the Self-supervised Mobile Device Representation Learning (SMDRL) framework. During the pre-training phase, SMDRL utilizes a substantial unlabeled audio dataset enhanced by three novel data augmentation techniques: interpolative noise mixing, time-frequency masking, and partitioned resampling. Employing contrastive learning, these methods facilitate the development of a universal encoder capable of effectively capturing device-specific features from raw audio. Further, we propose the Cross-scale Mobile Device Encoder (CMDEncoder), which integrates a Convolutional Neural Network (CNN) for local feature extraction with Long Short-Term Memory (LSTM) and Transformer-Encoder architectures to handle global-scale information. This encoder is further refined by numerical computations of standard deviations and mean values to enhance global feature interaction. In the fine-tuning phase, the framework employs the Light Weight Enhanced Channel Attention, Propagation in Time-Delay Neural Network (LWECAP-TDNN) classifier, achieving high-precision identification results. Our experimental outcomes demonstrate that the proposed method significantly elevates the accuracy of device source identification, achieving recognition rates of 95.24% and 89.87% on the CCNU Mobile Base and CCNU Mobile Large datasets, respectively. These results represent improvements ranging from 1.44% to 35.69% over baseline methods. To facilitate further research, the source code for this study is made publicly available at <span><span>https://github.com/CCNUZFW/SMDRL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"282 ","pages":"Article 127635"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425012576","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In mobile recording device source identification, deep learning techniques have been pivotal for extracting deep features from audio signals. Traditional approaches, however, predominantly rely on fully labeled datasets for supervised training, neglecting the vast amounts of unlabeled data typically present in real-world scenarios. This limitation has led to a significant disparity between the performance of existing models and their practical applicability. To bridge this gap, we introduce the Self-supervised Mobile Device Representation Learning (SMDRL) framework. During the pre-training phase, SMDRL utilizes a substantial unlabeled audio dataset enhanced by three novel data augmentation techniques: interpolative noise mixing, time-frequency masking, and partitioned resampling. Employing contrastive learning, these methods facilitate the development of a universal encoder capable of effectively capturing device-specific features from raw audio. Further, we propose the Cross-scale Mobile Device Encoder (CMDEncoder), which integrates a Convolutional Neural Network (CNN) for local feature extraction with Long Short-Term Memory (LSTM) and Transformer-Encoder architectures to handle global-scale information. This encoder is further refined by numerical computations of standard deviations and mean values to enhance global feature interaction. In the fine-tuning phase, the framework employs the Light Weight Enhanced Channel Attention, Propagation in Time-Delay Neural Network (LWECAP-TDNN) classifier, achieving high-precision identification results. Our experimental outcomes demonstrate that the proposed method significantly elevates the accuracy of device source identification, achieving recognition rates of 95.24% and 89.87% on the CCNU Mobile Base and CCNU Mobile Large datasets, respectively. These results represent improvements ranging from 1.44% to 35.69% over baseline methods. To facilitate further research, the source code for this study is made publicly available at https://github.com/CCNUZFW/SMDRL.
在移动录音设备源识别中,深度学习技术对于从音频信号中提取深度特征至关重要。然而,传统方法主要依赖于完全标记的数据集进行监督训练,而忽略了现实场景中通常存在的大量未标记数据。这一限制导致了现有模型的性能与其实际适用性之间的巨大差距。为了弥补这一差距,我们引入了自监督移动设备表示学习(SMDRL)框架。在预训练阶段,SMDRL利用大量未标记的音频数据集,通过三种新的数据增强技术进行增强:插值噪声混合、时频掩蔽和分区重采样。采用对比学习,这些方法促进了通用编码器的开发,能够有效地从原始音频中捕获设备特定的特征。此外,我们提出了跨尺度移动设备编码器(CMDEncoder),该编码器集成了卷积神经网络(CNN)用于局部特征提取,长短期记忆(LSTM)和变压器编码器架构来处理全局尺度信息。该编码器通过标准差和平均值的数值计算进一步细化,以增强全局特征的相互作用。在微调阶段,该框架采用了Light Weight Enhanced Channel Attention, Propagation In Time-Delay Neural Network (LWECAP-TDNN)分类器,实现了高精度的识别结果。实验结果表明,该方法显著提高了设备源识别的准确率,在CCNU Mobile Base和CCNU Mobile Large数据集上分别实现了95.24%和89.87%的识别率。这些结果表明,与基线方法相比,改进幅度从1.44%到35.69%不等。为了便于进一步研究,本研究的源代码已在https://github.com/CCNUZFW/SMDRL上公开提供。
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.