Comparison of Korean Speech De-identification Performance of Speech De-identification Model and Broadcast Voice Modulation

Korean Institute of Smart Media Pub Date : 2023-03-30 DOI:10.30693/smj.2023.12.2.56

Seung Min Kim, Dae Eol Park, Dae Seon Choi

{"title":"Comparison of Korean Speech De-identification Performance of Speech De-identification Model and Broadcast Voice Modulation","authors":"Seung Min Kim, Dae Eol Park, Dae Seon Choi","doi":"10.30693/smj.2023.12.2.56","DOIUrl":null,"url":null,"abstract":"In broadcasts such as news and coverage programs, voice is modulated to protect the identity of the informant. Adjusting the pitch is commonly used voice modulation method, which allows easy voice restoration to the original voice by adjusting the pitch. Therefore, since broadcast voice modulation methods cannot properly protect the identity of the speaker and are vulnerable to security, a new voice modulation method is needed to replace them. In this paper, using the Lightweight speech de-identification model as the evaluation target model, we compare speech de-identification performance with broadcast voice modulation method using pitch modulation. Among the six modulation methods in the Lightweight speech de-identification model, we experimented on the de-identification performance of Korean speech as a human test and EER(Equal Error Rate) test compared with broadcast voice modulation using three modulation methods: McAdams, Resampling, and Vocal Tract Length Normalization(VTLN). Experimental results show VTLN modulation methods performed higher de-identification performance in both human tests and EER tests. As a result, the modulation methods of the Lightweight model for Korean speech has sufficient de-identification performance and will be able to replace the security-weak broadcast voice modulation.","PeriodicalId":249252,"journal":{"name":"Korean Institute of Smart Media","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Korean Institute of Smart Media","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30693/smj.2023.12.2.56","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In broadcasts such as news and coverage programs, voice is modulated to protect the identity of the informant. Adjusting the pitch is commonly used voice modulation method, which allows easy voice restoration to the original voice by adjusting the pitch. Therefore, since broadcast voice modulation methods cannot properly protect the identity of the speaker and are vulnerable to security, a new voice modulation method is needed to replace them. In this paper, using the Lightweight speech de-identification model as the evaluation target model, we compare speech de-identification performance with broadcast voice modulation method using pitch modulation. Among the six modulation methods in the Lightweight speech de-identification model, we experimented on the de-identification performance of Korean speech as a human test and EER(Equal Error Rate) test compared with broadcast voice modulation using three modulation methods: McAdams, Resampling, and Vocal Tract Length Normalization(VTLN). Experimental results show VTLN modulation methods performed higher de-identification performance in both human tests and EER tests. As a result, the modulation methods of the Lightweight model for Korean speech has sufficient de-identification performance and will be able to replace the security-weak broadcast voice modulation.

查看原文本刊更多论文

语音去识别模型与广播语音调制的韩语语音去识别性能比较

在诸如新闻和报道节目之类的广播中，声音经过调制以保护举报人的身份。调节音高是常用的调音方法，通过调节音高可以很容易地使声音恢复到原来的声音。因此，由于广播语音调制方法不能很好地保护说话者的身份，并且容易受到安全性的影响，需要一种新的语音调制方法来取代它们。本文以轻量级语音去识别模型作为评价目标模型，比较了基于基音调制的广播语音去识别方法的语音去识别性能。在轻量级语音去识别模型的六种调制方法中，我们使用McAdams、ressampling和Vocal Tract Length Normalization(VTLN)三种调制方法，对韩语语音作为人体测试和EER(等错误率)测试的去识别性能与广播语音调制进行了比较。实验结果表明，VTLN调制方法在人体试验和EER试验中都具有较高的去识别性能。因此，韩国语轻量级模式的调制方法具有足够的去识别性能，可以取代安全性较弱的广播语音调制。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Korean Institute of Smart Media

自引率

0.00%

发文量