Detecting audio splicing forgery: A noise-robust approach with Swin Transformer and cochleagram

IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Tolgahan Gulsoy , Elif Kanca Gulsoy , Arda Ustubioglu , Beste Ustubioglu , Elif Baykal Kablan , Selen Ayas , Guzin Ulutas , Gul Tahaoglu , Mohamed Elhoseny
{"title":"Detecting audio splicing forgery: A noise-robust approach with Swin Transformer and cochleagram","authors":"Tolgahan Gulsoy ,&nbsp;Elif Kanca Gulsoy ,&nbsp;Arda Ustubioglu ,&nbsp;Beste Ustubioglu ,&nbsp;Elif Baykal Kablan ,&nbsp;Selen Ayas ,&nbsp;Guzin Ulutas ,&nbsp;Gul Tahaoglu ,&nbsp;Mohamed Elhoseny","doi":"10.1016/j.jisa.2025.104130","DOIUrl":null,"url":null,"abstract":"<div><div>Audio splicing forgery involves cutting specific parts of an audio recording and inserting or combining them into another audio recording. This manipulation technique is often used to create misleading or fake audio content, particularly in digital media environments. The detection of audio splicing forgery is of great importance, especially in forensic analysis, security applications and media verification processes. In this paper, we present a novel noise robust method for detecting audio splicing forgery. The proposed method converts audio signals into cochleagram images, which are then input into SWIN transformer model for training. Following the training process, the model classifies and labels test audio files as either original or fake. In the experiments, the method is tested on data sets of varying durations. The results demonstrate high performance across different datasets, both without and with Gaussian noise, as well as under real-world environmental noise attacks with varying audio durations. For example, under 30 dB noise condition on 2-second data segments, the model achieved an accuracy of 94.33%, precision of 96.46%, recall of 92.90%, and an F1-score of 94.65%. For rain noise condition, the proposed method achieves the highest accuracy of 93.26%, precision of 99.83%, and F1-score of 95.48% .</div></div>","PeriodicalId":48638,"journal":{"name":"Journal of Information Security and Applications","volume":"93 ","pages":"Article 104130"},"PeriodicalIF":3.8000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Security and Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S221421262500167X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Audio splicing forgery involves cutting specific parts of an audio recording and inserting or combining them into another audio recording. This manipulation technique is often used to create misleading or fake audio content, particularly in digital media environments. The detection of audio splicing forgery is of great importance, especially in forensic analysis, security applications and media verification processes. In this paper, we present a novel noise robust method for detecting audio splicing forgery. The proposed method converts audio signals into cochleagram images, which are then input into SWIN transformer model for training. Following the training process, the model classifies and labels test audio files as either original or fake. In the experiments, the method is tested on data sets of varying durations. The results demonstrate high performance across different datasets, both without and with Gaussian noise, as well as under real-world environmental noise attacks with varying audio durations. For example, under 30 dB noise condition on 2-second data segments, the model achieved an accuracy of 94.33%, precision of 96.46%, recall of 92.90%, and an F1-score of 94.65%. For rain noise condition, the proposed method achieves the highest accuracy of 93.26%, precision of 99.83%, and F1-score of 95.48% .
检测音频拼接伪造:基于Swin变压器和耳蜗图的噪声鲁棒方法
音频拼接伪造涉及剪切音频记录的特定部分,并将它们插入或组合到另一个音频记录中。这种操纵技术经常被用来制造误导性或虚假的音频内容,特别是在数字媒体环境中。音频拼接伪造的检测在法医分析、安全应用和媒体验证过程中具有重要意义。本文提出了一种新的噪声鲁棒检测音频拼接伪造的方法。该方法将音频信号转换为耳蜗图图像,然后输入到SWIN变压器模型中进行训练。在训练过程之后,该模型将测试音频文件分类并标记为原始或虚假。在实验中,该方法在不同持续时间的数据集上进行了测试。结果表明,在不同的数据集上,无论是没有高斯噪声还是有高斯噪声,以及在具有不同音频持续时间的真实环境噪声攻击下,都具有高性能。以2秒数据段为例,在噪声为30 dB的条件下,该模型的准确率为94.33%,精密度为96.46%,召回率为92.90%,f1得分为94.65%。在雨噪声条件下,该方法的准确率为93.26%,精密度为99.83%,f1分数为95.48%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Information Security and Applications
Journal of Information Security and Applications Computer Science-Computer Networks and Communications
CiteScore
10.90
自引率
5.40%
发文量
206
审稿时长
56 days
期刊介绍: Journal of Information Security and Applications (JISA) focuses on the original research and practice-driven applications with relevance to information security and applications. JISA provides a common linkage between a vibrant scientific and research community and industry professionals by offering a clear view on modern problems and challenges in information security, as well as identifying promising scientific and "best-practice" solutions. JISA issues offer a balance between original research work and innovative industrial approaches by internationally renowned information security experts and researchers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信