音频记录中的时间尺度检测与估计

M. Pilia, S. Mandelli, Paolo Bestagini, S. Tubaro
{"title":"音频记录中的时间尺度检测与估计","authors":"M. Pilia, S. Mandelli, Paolo Bestagini, S. Tubaro","doi":"10.1109/WIFS53200.2021.9648389","DOIUrl":null,"url":null,"abstract":"The widespread diffusion of user friendly editing software for audio signals has made audio tampering extremely accessible to anyone. Therefore, it is increasingly necessary to develop forensic methodologies aiming at verifying if a given audio content has been digitally manipulated or not. Among the multiple available audio editing techniques, a very common one is time scaling, i.e., altering the temporal evolution of an audio signal without affecting any pitch component. For instance, this can be used to slow-down or speed-up speech recordings, thus enabling the creation of natural sounding fake speech compositions. In this work, we propose to blindly detect and estimate the time scaling applied to an audio signal. To expose time scaling, we leverage a Convolutional Neural Network that analyzes the Log-Mel Spectrogram and the phase of the Short Time Fourier Transform of the input audio signal. The proposed technique is tested on different audio datasets, considering various time scaling implementations and challenging cross test scenarios.","PeriodicalId":196985,"journal":{"name":"2021 IEEE International Workshop on Information Forensics and Security (WIFS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Time Scaling Detection and Estimation in Audio Recordings\",\"authors\":\"M. Pilia, S. Mandelli, Paolo Bestagini, S. Tubaro\",\"doi\":\"10.1109/WIFS53200.2021.9648389\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The widespread diffusion of user friendly editing software for audio signals has made audio tampering extremely accessible to anyone. Therefore, it is increasingly necessary to develop forensic methodologies aiming at verifying if a given audio content has been digitally manipulated or not. Among the multiple available audio editing techniques, a very common one is time scaling, i.e., altering the temporal evolution of an audio signal without affecting any pitch component. For instance, this can be used to slow-down or speed-up speech recordings, thus enabling the creation of natural sounding fake speech compositions. In this work, we propose to blindly detect and estimate the time scaling applied to an audio signal. To expose time scaling, we leverage a Convolutional Neural Network that analyzes the Log-Mel Spectrogram and the phase of the Short Time Fourier Transform of the input audio signal. The proposed technique is tested on different audio datasets, considering various time scaling implementations and challenging cross test scenarios.\",\"PeriodicalId\":196985,\"journal\":{\"name\":\"2021 IEEE International Workshop on Information Forensics and Security (WIFS)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Workshop on Information Forensics and Security (WIFS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WIFS53200.2021.9648389\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Workshop on Information Forensics and Security (WIFS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WIFS53200.2021.9648389","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

用户友好的音频信号编辑软件的广泛传播使得任何人都可以对音频进行篡改。因此,越来越有必要开发旨在验证给定音频内容是否已被数字操作的法医方法。在多种可用的音频编辑技术中,非常常见的一种是时间缩放,即在不影响任何音高分量的情况下改变音频信号的时间演变。例如,这可以用来减慢或加速语音录音,从而创造自然的声音假的语音成分。在这项工作中,我们提出了盲检测和估计应用于音频信号的时间尺度。为了揭示时间尺度,我们利用卷积神经网络来分析输入音频信号的Log-Mel谱图和短时傅里叶变换的相位。该技术在不同的音频数据集上进行了测试,考虑了不同的时间尺度实现和具有挑战性的交叉测试场景。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Time Scaling Detection and Estimation in Audio Recordings
The widespread diffusion of user friendly editing software for audio signals has made audio tampering extremely accessible to anyone. Therefore, it is increasingly necessary to develop forensic methodologies aiming at verifying if a given audio content has been digitally manipulated or not. Among the multiple available audio editing techniques, a very common one is time scaling, i.e., altering the temporal evolution of an audio signal without affecting any pitch component. For instance, this can be used to slow-down or speed-up speech recordings, thus enabling the creation of natural sounding fake speech compositions. In this work, we propose to blindly detect and estimate the time scaling applied to an audio signal. To expose time scaling, we leverage a Convolutional Neural Network that analyzes the Log-Mel Spectrogram and the phase of the Short Time Fourier Transform of the input audio signal. The proposed technique is tested on different audio datasets, considering various time scaling implementations and challenging cross test scenarios.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信