A scheme discriminating between synthetic speech and normal speech

2016 International Conference on Audio, Language and Image Processing (ICALIP) Pub Date : 2016-07-01 DOI:10.1109/ICALIP.2016.7846613

Jilun Chen, Weiqiang Zhang, Jia Liu

引用次数: 0

Abstract

This paper develops a system to automatically distinguish natural speech from synthetic speech. The issue of feature selection is considered. We take commonly used feature Mel-Frequency Cepstrum Coefficient (MFCC) in consideration, as well as other features such as Relative Phase Shift (RPS) and pitch tuned for Automatically Speech Recognition (ASR). We found some features are complimentary in the task of discriminating synthetic and natural speech. Gaussian Mixture Model Support Vector Machine (GMM-SVM) system is applied as classifier with feature input modified and compared to that of feature is applied in speaker recognition. Experiment on Librespeech versus online Text-to-Speech (TTS) speech synthesis platforms data set verified the effectiveness of the combination of these features.

查看原文本刊更多论文

一种区分合成语音和正常语音的方案

本文开发了一个自动识别自然语音和合成语音的系统。考虑了特征选择问题。我们考虑了常用的Mel-Frequency倒频谱系数(MFCC)特征，以及自动语音识别(ASR)的相对相移(RPS)和音调调谐等其他特征。我们发现在区分合成语音和自然语音的任务中，一些特征是互补的。将高斯混合模型支持向量机(GMM-SVM)系统作为分类器，对特征输入进行修正，并与特征输入进行比较，用于说话人识别。在Librespeech和在线文本到语音(TTS)语音合成平台数据集上的实验验证了这些特征组合的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 International Conference on Audio, Language and Image Processing (ICALIP)

自引率

0.00%

发文量