Modelling Glottal Flow Derivative Signal for Detection of Replay Speech Samples

2019 National Conference on Communications (NCC) Pub Date : 2019-02-01 DOI:10.1109/NCC.2019.8732249

Jagabandhu Mishra, D. Pati, S. Prasanna

{"title":"Modelling Glottal Flow Derivative Signal for Detection of Replay Speech Samples","authors":"Jagabandhu Mishra, D. Pati, S. Prasanna","doi":"10.1109/NCC.2019.8732249","DOIUrl":null,"url":null,"abstract":"It is a widely known fact that automatic speaker verification systems are quite vulnerable to replay speech. The present work deals with detecting replay speech by using the information available in glottal flow derivative (GFD) signal. In signal processing terms, the speech signal can be represented as the response of a vocal-tract system with excited by a excitation source in the form of glottal flow. The effect of record and replay devices distorted the spectral characteristics of the naturally uttered speech sample, resulting distortion in corresponding GFD signals. In this work the GFD signals are parameterized by using standard mel filters and Gaussian mixtures models are made for detection. Although various methods are available, by correlation analysis it is observed that in the context of the present work the dynamic programming phase slope algorithm (DYPSA) method is relatively more effective in estimating the GFD signals. The experimental studies are made on ASVSpoof2017 database. The proposed glottal flow derivative mel frequency cepstral coefficients (GFDMFCC) feature provides 20.53% equal error rate (EER). This performance is comparatively poor than by speech and residual based features. It is mainly due to the absence of fine structure information in estimated GFD signal. However, in fusion with speech signal based constant-Q cepstral coefficients (CQCC) features, the GFDMFCC feature provides an improvement of 10.30% with reference to conventional residual feature. This shows the usefulness of modelling GFD signals for detection of replay signals.","PeriodicalId":6870,"journal":{"name":"2019 National Conference on Communications (NCC)","volume":"45 1","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 National Conference on Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2019.8732249","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

It is a widely known fact that automatic speaker verification systems are quite vulnerable to replay speech. The present work deals with detecting replay speech by using the information available in glottal flow derivative (GFD) signal. In signal processing terms, the speech signal can be represented as the response of a vocal-tract system with excited by a excitation source in the form of glottal flow. The effect of record and replay devices distorted the spectral characteristics of the naturally uttered speech sample, resulting distortion in corresponding GFD signals. In this work the GFD signals are parameterized by using standard mel filters and Gaussian mixtures models are made for detection. Although various methods are available, by correlation analysis it is observed that in the context of the present work the dynamic programming phase slope algorithm (DYPSA) method is relatively more effective in estimating the GFD signals. The experimental studies are made on ASVSpoof2017 database. The proposed glottal flow derivative mel frequency cepstral coefficients (GFDMFCC) feature provides 20.53% equal error rate (EER). This performance is comparatively poor than by speech and residual based features. It is mainly due to the absence of fine structure information in estimated GFD signal. However, in fusion with speech signal based constant-Q cepstral coefficients (CQCC) features, the GFDMFCC feature provides an improvement of 10.30% with reference to conventional residual feature. This shows the usefulness of modelling GFD signals for detection of replay signals.

查看原文本刊更多论文

基于声门流导数信号的重放语音样本检测

一个众所周知的事实是，自动说话人验证系统非常容易受到语音重播的影响。本文的工作是利用声门流导数(GFD)信号中的信息来检测重放语音。在信号处理方面，语音信号可以表示为声道系统受到声门流形式的激励源的响应。记录和重放设备的作用扭曲了自然发出的语音样本的频谱特性，导致相应的GFD信号失真。本文采用标准mel滤波器对GFD信号进行参数化，并建立高斯混合模型进行检测。虽然有多种方法可用，但通过相关分析可以看出，在本研究的背景下，动态规划相位斜率算法(DYPSA)方法在估计GFD信号方面相对更有效。在asvspof2017数据库上进行了实验研究。所提出的声门流导数频率倒谱系数(GFDMFCC)特征可提供20.53%的等错误率(EER)。与基于语音和残差的特征相比，这种性能相对较差。这主要是由于估计的GFD信号中缺少精细结构信息。然而，在与基于语音信号的恒q倒谱系数(CQCC)特征融合时，GFDMFCC特征比传统残差特征提高了10.30%。这表明建模GFD信号对检测重放信号有用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 National Conference on Communications (NCC)

自引率

0.00%

发文量