{"title":"On Breathing Pattern Information in Synthetic Speech","authors":"Z. Mostaani, M. Magimai.-Doss","doi":"10.21437/interspeech.2022-10271","DOIUrl":null,"url":null,"abstract":"The respiratory system is an integral part of human speech production. As a consequence, there is a close relation between respiration and speech signal, and the produced speech signal carries breathing pattern related information. Speech can also be generated using speech synthesis systems. In this paper, we investigate whether synthetic speech carries breathing pattern related information in the same way as natural human speech. We address this research question in the framework of logical-access presentation attack detection using embeddings extracted from neural networks pre-trained for speech breathing pattern estimation. Our studies on ASVSpoof 2019 challenge data show that there is a clear distinction between the extracted breathing pattern embedding of natural human speech and syn-thesized speech, indicating that speech synthesis systems tend to not carry breathing pattern related information in the same way as human speech. Whilst, this is not the case with voice conversion of natural human speech.","PeriodicalId":73500,"journal":{"name":"Interspeech","volume":"1 1","pages":"2768-2772"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interspeech","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/interspeech.2022-10271","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The respiratory system is an integral part of human speech production. As a consequence, there is a close relation between respiration and speech signal, and the produced speech signal carries breathing pattern related information. Speech can also be generated using speech synthesis systems. In this paper, we investigate whether synthetic speech carries breathing pattern related information in the same way as natural human speech. We address this research question in the framework of logical-access presentation attack detection using embeddings extracted from neural networks pre-trained for speech breathing pattern estimation. Our studies on ASVSpoof 2019 challenge data show that there is a clear distinction between the extracted breathing pattern embedding of natural human speech and syn-thesized speech, indicating that speech synthesis systems tend to not carry breathing pattern related information in the same way as human speech. Whilst, this is not the case with voice conversion of natural human speech.