Ayush Agarwal, Amitabh Swain, Jagabandhu Mishra, S. Prasanna
{"title":"Significance of Prosody Modification in Privacy Preservation on speaker verification","authors":"Ayush Agarwal, Amitabh Swain, Jagabandhu Mishra, S. Prasanna","doi":"10.1109/NCC55593.2022.9806769","DOIUrl":null,"url":null,"abstract":"Privacy is the major concern that comes to the user's mind before sharing their data. There are various methods proposed in literature for providing privacy to speech data. Previous works that have been done to protect the speaker identity were done for speech applications like automatic speech recognition (ASR), speech analysis, etc. For these applications the presence of speaker identity is not essential while processing. The objective of this work is to provide privacy to the task in which presence of speaker identity is essential at the time of processing. In this work, privacy is provided to the speaker identity information present in speech signals while performing automatic speaker verification (ASV) tasks. In order to achieve the same, this work proposes a prosody modification based approach. The proposed approach is able to conceal the speaker identity from human perception by changing the pitch of the speech utterances with a pitch modification factor of $\\alpha\\geq 1$ But at the same time the ASV system provides consistent performance irrespective of the change in pitch (i.e. for $\\alpha\\geq 1)$. The same evidence has been shown through experiments in TIMIT and IITG-MV databases. A subjective study has also performed to verify the extent of speaker anonymization with respect to humans. The subjective study evaluates the performance in terms of mean opinion score (MOS). The observed MOS signifies the ability of the proposed approach to conceal the speaker's identity.","PeriodicalId":403870,"journal":{"name":"2022 National Conference on Communications (NCC)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 National Conference on Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC55593.2022.9806769","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Privacy is the major concern that comes to the user's mind before sharing their data. There are various methods proposed in literature for providing privacy to speech data. Previous works that have been done to protect the speaker identity were done for speech applications like automatic speech recognition (ASR), speech analysis, etc. For these applications the presence of speaker identity is not essential while processing. The objective of this work is to provide privacy to the task in which presence of speaker identity is essential at the time of processing. In this work, privacy is provided to the speaker identity information present in speech signals while performing automatic speaker verification (ASV) tasks. In order to achieve the same, this work proposes a prosody modification based approach. The proposed approach is able to conceal the speaker identity from human perception by changing the pitch of the speech utterances with a pitch modification factor of $\alpha\geq 1$ But at the same time the ASV system provides consistent performance irrespective of the change in pitch (i.e. for $\alpha\geq 1)$. The same evidence has been shown through experiments in TIMIT and IITG-MV databases. A subjective study has also performed to verify the extent of speaker anonymization with respect to humans. The subjective study evaluates the performance in terms of mean opinion score (MOS). The observed MOS signifies the ability of the proposed approach to conceal the speaker's identity.