一种区分合成语音和正常语音的方案

Jilun Chen, Weiqiang Zhang, Jia Liu
{"title":"一种区分合成语音和正常语音的方案","authors":"Jilun Chen, Weiqiang Zhang, Jia Liu","doi":"10.1109/ICALIP.2016.7846613","DOIUrl":null,"url":null,"abstract":"This paper develops a system to automatically distinguish natural speech from synthetic speech. The issue of feature selection is considered. We take commonly used feature Mel-Frequency Cepstrum Coefficient (MFCC) in consideration, as well as other features such as Relative Phase Shift (RPS) and pitch tuned for Automatically Speech Recognition (ASR). We found some features are complimentary in the task of discriminating synthetic and natural speech. Gaussian Mixture Model Support Vector Machine (GMM-SVM) system is applied as classifier with feature input modified and compared to that of feature is applied in speaker recognition. Experiment on Librespeech versus online Text-to-Speech (TTS) speech synthesis platforms data set verified the effectiveness of the combination of these features.","PeriodicalId":184170,"journal":{"name":"2016 International Conference on Audio, Language and Image Processing (ICALIP)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A scheme discriminating between synthetic speech and normal speech\",\"authors\":\"Jilun Chen, Weiqiang Zhang, Jia Liu\",\"doi\":\"10.1109/ICALIP.2016.7846613\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper develops a system to automatically distinguish natural speech from synthetic speech. The issue of feature selection is considered. We take commonly used feature Mel-Frequency Cepstrum Coefficient (MFCC) in consideration, as well as other features such as Relative Phase Shift (RPS) and pitch tuned for Automatically Speech Recognition (ASR). We found some features are complimentary in the task of discriminating synthetic and natural speech. Gaussian Mixture Model Support Vector Machine (GMM-SVM) system is applied as classifier with feature input modified and compared to that of feature is applied in speaker recognition. Experiment on Librespeech versus online Text-to-Speech (TTS) speech synthesis platforms data set verified the effectiveness of the combination of these features.\",\"PeriodicalId\":184170,\"journal\":{\"name\":\"2016 International Conference on Audio, Language and Image Processing (ICALIP)\",\"volume\":\"136 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Conference on Audio, Language and Image Processing (ICALIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICALIP.2016.7846613\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Audio, Language and Image Processing (ICALIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICALIP.2016.7846613","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文开发了一个自动识别自然语音和合成语音的系统。考虑了特征选择问题。我们考虑了常用的Mel-Frequency倒频谱系数(MFCC)特征,以及自动语音识别(ASR)的相对相移(RPS)和音调调谐等其他特征。我们发现在区分合成语音和自然语音的任务中,一些特征是互补的。将高斯混合模型支持向量机(GMM-SVM)系统作为分类器,对特征输入进行修正,并与特征输入进行比较,用于说话人识别。在Librespeech和在线文本到语音(TTS)语音合成平台数据集上的实验验证了这些特征组合的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A scheme discriminating between synthetic speech and normal speech
This paper develops a system to automatically distinguish natural speech from synthetic speech. The issue of feature selection is considered. We take commonly used feature Mel-Frequency Cepstrum Coefficient (MFCC) in consideration, as well as other features such as Relative Phase Shift (RPS) and pitch tuned for Automatically Speech Recognition (ASR). We found some features are complimentary in the task of discriminating synthetic and natural speech. Gaussian Mixture Model Support Vector Machine (GMM-SVM) system is applied as classifier with feature input modified and compared to that of feature is applied in speaker recognition. Experiment on Librespeech versus online Text-to-Speech (TTS) speech synthesis platforms data set verified the effectiveness of the combination of these features.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信