Studying the Effect of Face Masks in Identifying Speakers using LSTM

Mohamed Bader, I. Shahin, A. Ahmed, N. Werghi
{"title":"Studying the Effect of Face Masks in Identifying Speakers using LSTM","authors":"Mohamed Bader, I. Shahin, A. Ahmed, N. Werghi","doi":"10.1109/ICECTA57148.2022.9990479","DOIUrl":null,"url":null,"abstract":"During the COVID-19 pandemic, it has been a standard procedure for people all around the world to use Respiratory Protection Masks (RPM) that cover both the nose and the mouth. The Consequences of wearing RPMs, those pertaining to the perception and production of spoken communication, are rapidly becoming more prominent. Nevertheless, the utilization of face masks also causes attenuation in voice signals, and this alteration affects speech-processing technologies such as Automatic Speaker Verification (ASV) and speech-to-text conversion. An intervention by a deep learning-based algorithm is considered vital to remedy the issue of inappropriate exploitation of speaker-based technology. Therefore, in the proposed framework, a speaker identification system has been implemented to examine the effect of masks. First, the speech signals have been captured, pre-processed, and augmented by a variety of data augmentation techniques. Afterward, different “Mel-Frequency Cepstral Coefficients” (MFCC) features have been extracted to be fed into a “Long Short-Term Memory” (LSTM) for identifying speakers. The system’s overall performance has been assessed using accuracy, precision, recall, and Fl-score, which yields 93%, 93.3%, 92.2%, and 92.8%, respectively. The obtained results are still in a rudimentary phase, and they are subjected to further enhancements in the future by data expansion and exploitation of multiple optimization techniques.","PeriodicalId":337798,"journal":{"name":"2022 International Conference on Electrical and Computing Technologies and Applications (ICECTA)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Electrical and Computing Technologies and Applications (ICECTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECTA57148.2022.9990479","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

During the COVID-19 pandemic, it has been a standard procedure for people all around the world to use Respiratory Protection Masks (RPM) that cover both the nose and the mouth. The Consequences of wearing RPMs, those pertaining to the perception and production of spoken communication, are rapidly becoming more prominent. Nevertheless, the utilization of face masks also causes attenuation in voice signals, and this alteration affects speech-processing technologies such as Automatic Speaker Verification (ASV) and speech-to-text conversion. An intervention by a deep learning-based algorithm is considered vital to remedy the issue of inappropriate exploitation of speaker-based technology. Therefore, in the proposed framework, a speaker identification system has been implemented to examine the effect of masks. First, the speech signals have been captured, pre-processed, and augmented by a variety of data augmentation techniques. Afterward, different “Mel-Frequency Cepstral Coefficients” (MFCC) features have been extracted to be fed into a “Long Short-Term Memory” (LSTM) for identifying speakers. The system’s overall performance has been assessed using accuracy, precision, recall, and Fl-score, which yields 93%, 93.3%, 92.2%, and 92.8%, respectively. The obtained results are still in a rudimentary phase, and they are subjected to further enhancements in the future by data expansion and exploitation of multiple optimization techniques.
基于LSTM的面具识别说话人效果研究
在2019冠状病毒病大流行期间,使用覆盖口鼻的呼吸防护口罩(RPM)一直是世界各地人们的标准程序。佩戴rpm的后果,与口头交流的感知和产生有关,正迅速变得更加突出。然而,面罩的使用也会导致语音信号的衰减,这种变化会影响语音处理技术,如自动说话人验证(ASV)和语音到文本转换。基于深度学习的算法的干预被认为对纠正不适当利用基于说话人的技术的问题至关重要。因此,在提出的框架中,我们实现了一个说话人识别系统来检测掩码的影响。首先,语音信号被捕获,预处理,并通过各种数据增强技术增强。然后,提取不同的“Mel-Frequency倒谱系数”(MFCC)特征,并将其输入“长短期记忆”(LSTM)中,用于识别说话者。该系统的整体性能通过准确性、精密度、召回率和fl分数进行评估,分别达到93%、93.3%、92.2%和92.8%。得到的结果仍处于初级阶段,未来将通过数据扩展和多种优化技术的利用进一步增强。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信