基于卷积神经网络的弱音高移位语音识别

Yongchao Ye, Lingjie Lao, Diqun Yan, Rangding Wang
{"title":"基于卷积神经网络的弱音高移位语音识别","authors":"Yongchao Ye, Lingjie Lao, Diqun Yan, Rangding Wang","doi":"10.1155/2020/8927031","DOIUrl":null,"url":null,"abstract":"Pitch shifting is a common voice editing technique in which the original pitch of a digital voice is raised or lowered. It is likely to be abused by the malicious attacker to conceal his/her true identity. Existing forensic detection methods are no longer effective for weakly pitch-shifted voice. In this paper, we proposed a convolutional neural network (CNN) to detect not only strongly pitch-shifted voice but also weakly pitch-shifted voice of which the shifting factor is less than ±4 semitones. Specifically, linear frequency cepstral coefficients (LFCC) computed from power spectrums are considered and their dynamic coefficients are extracted as the discriminative features. And the CNN model is carefully designed with particular attention to the input feature map, the activation function and the network topology. We evaluated the algorithm on voices from two datasets with three pitch shifting software. Extensive results show that the algorithm achieves high detection rates for both binary and multiple classifications.","PeriodicalId":204253,"journal":{"name":"Int. J. Digit. Multim. Broadcast.","volume":"96 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Identification of Weakly Pitch-Shifted Voice Based on Convolutional Neural Network\",\"authors\":\"Yongchao Ye, Lingjie Lao, Diqun Yan, Rangding Wang\",\"doi\":\"10.1155/2020/8927031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pitch shifting is a common voice editing technique in which the original pitch of a digital voice is raised or lowered. It is likely to be abused by the malicious attacker to conceal his/her true identity. Existing forensic detection methods are no longer effective for weakly pitch-shifted voice. In this paper, we proposed a convolutional neural network (CNN) to detect not only strongly pitch-shifted voice but also weakly pitch-shifted voice of which the shifting factor is less than ±4 semitones. Specifically, linear frequency cepstral coefficients (LFCC) computed from power spectrums are considered and their dynamic coefficients are extracted as the discriminative features. And the CNN model is carefully designed with particular attention to the input feature map, the activation function and the network topology. We evaluated the algorithm on voices from two datasets with three pitch shifting software. Extensive results show that the algorithm achieves high detection rates for both binary and multiple classifications.\",\"PeriodicalId\":204253,\"journal\":{\"name\":\"Int. J. Digit. Multim. Broadcast.\",\"volume\":\"96 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Digit. Multim. Broadcast.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1155/2020/8927031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Digit. Multim. Broadcast.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2020/8927031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

音高变换是一种常见的声音编辑技术,在这种技术中,数字声音的原始音高被提高或降低。这很可能被恶意攻击者滥用,以隐藏其真实身份。现有的法医检测方法对弱音高移位语音已不再有效。在本文中,我们提出了一种卷积神经网络(CNN)来检测强音高移音和弱音高移音(移位因子小于±4个半音)。具体而言,考虑从功率谱中计算线性频率倒谱系数(LFCC),并提取其动态系数作为判别特征。并且对CNN模型进行了精心设计,特别注意了输入特征图、激活函数和网络拓扑。我们用三种音调转换软件对两个数据集的声音进行了评估。广泛的实验结果表明,该算法在二值分类和多重分类中都取得了较高的检测率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Identification of Weakly Pitch-Shifted Voice Based on Convolutional Neural Network
Pitch shifting is a common voice editing technique in which the original pitch of a digital voice is raised or lowered. It is likely to be abused by the malicious attacker to conceal his/her true identity. Existing forensic detection methods are no longer effective for weakly pitch-shifted voice. In this paper, we proposed a convolutional neural network (CNN) to detect not only strongly pitch-shifted voice but also weakly pitch-shifted voice of which the shifting factor is less than ±4 semitones. Specifically, linear frequency cepstral coefficients (LFCC) computed from power spectrums are considered and their dynamic coefficients are extracted as the discriminative features. And the CNN model is carefully designed with particular attention to the input feature map, the activation function and the network topology. We evaluated the algorithm on voices from two datasets with three pitch shifting software. Extensive results show that the algorithm achieves high detection rates for both binary and multiple classifications.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信