音频记录中的时间尺度检测与估计

2021 IEEE International Workshop on Information Forensics and Security (WIFS) Pub Date : 2021-12-07 DOI:10.1109/WIFS53200.2021.9648389

M. Pilia, S. Mandelli, Paolo Bestagini, S. Tubaro

{"title":"音频记录中的时间尺度检测与估计","authors":"M. Pilia, S. Mandelli, Paolo Bestagini, S. Tubaro","doi":"10.1109/WIFS53200.2021.9648389","DOIUrl":null,"url":null,"abstract":"The widespread diffusion of user friendly editing software for audio signals has made audio tampering extremely accessible to anyone. Therefore, it is increasingly necessary to develop forensic methodologies aiming at verifying if a given audio content has been digitally manipulated or not. Among the multiple available audio editing techniques, a very common one is time scaling, i.e., altering the temporal evolution of an audio signal without affecting any pitch component. For instance, this can be used to slow-down or speed-up speech recordings, thus enabling the creation of natural sounding fake speech compositions. In this work, we propose to blindly detect and estimate the time scaling applied to an audio signal. To expose time scaling, we leverage a Convolutional Neural Network that analyzes the Log-Mel Spectrogram and the phase of the Short Time Fourier Transform of the input audio signal. The proposed technique is tested on different audio datasets, considering various time scaling implementations and challenging cross test scenarios.","PeriodicalId":196985,"journal":{"name":"2021 IEEE International Workshop on Information Forensics and Security (WIFS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Time Scaling Detection and Estimation in Audio Recordings\",\"authors\":\"M. Pilia, S. Mandelli, Paolo Bestagini, S. Tubaro\",\"doi\":\"10.1109/WIFS53200.2021.9648389\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The widespread diffusion of user friendly editing software for audio signals has made audio tampering extremely accessible to anyone. Therefore, it is increasingly necessary to develop forensic methodologies aiming at verifying if a given audio content has been digitally manipulated or not. Among the multiple available audio editing techniques, a very common one is time scaling, i.e., altering the temporal evolution of an audio signal without affecting any pitch component. For instance, this can be used to slow-down or speed-up speech recordings, thus enabling the creation of natural sounding fake speech compositions. In this work, we propose to blindly detect and estimate the time scaling applied to an audio signal. To expose time scaling, we leverage a Convolutional Neural Network that analyzes the Log-Mel Spectrogram and the phase of the Short Time Fourier Transform of the input audio signal. The proposed technique is tested on different audio datasets, considering various time scaling implementations and challenging cross test scenarios.\",\"PeriodicalId\":196985,\"journal\":{\"name\":\"2021 IEEE International Workshop on Information Forensics and Security (WIFS)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Workshop on Information Forensics and Security (WIFS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WIFS53200.2021.9648389\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Workshop on Information Forensics and Security (WIFS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WIFS53200.2021.9648389","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

用户友好的音频信号编辑软件的广泛传播使得任何人都可以对音频进行篡改。因此，越来越有必要开发旨在验证给定音频内容是否已被数字操作的法医方法。在多种可用的音频编辑技术中，非常常见的一种是时间缩放，即在不影响任何音高分量的情况下改变音频信号的时间演变。例如，这可以用来减慢或加速语音录音，从而创造自然的声音假的语音成分。在这项工作中，我们提出了盲检测和估计应用于音频信号的时间尺度。为了揭示时间尺度，我们利用卷积神经网络来分析输入音频信号的Log-Mel谱图和短时傅里叶变换的相位。该技术在不同的音频数据集上进行了测试，考虑了不同的时间尺度实现和具有挑战性的交叉测试场景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Time Scaling Detection and Estimation in Audio Recordings

The widespread diffusion of user friendly editing software for audio signals has made audio tampering extremely accessible to anyone. Therefore, it is increasingly necessary to develop forensic methodologies aiming at verifying if a given audio content has been digitally manipulated or not. Among the multiple available audio editing techniques, a very common one is time scaling, i.e., altering the temporal evolution of an audio signal without affecting any pitch component. For instance, this can be used to slow-down or speed-up speech recordings, thus enabling the creation of natural sounding fake speech compositions. In this work, we propose to blindly detect and estimate the time scaling applied to an audio signal. To expose time scaling, we leverage a Convolutional Neural Network that analyzes the Log-Mel Spectrogram and the phase of the Short Time Fourier Transform of the input audio signal. The proposed technique is tested on different audio datasets, considering various time scaling implementations and challenging cross test scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE International Workshop on Information Forensics and Security (WIFS)

自引率

0.00%

发文量