{"title":"Studying the Effect of Face Masks in Identifying Speakers using LSTM","authors":"Mohamed Bader, I. Shahin, A. Ahmed, N. Werghi","doi":"10.1109/ICECTA57148.2022.9990479","DOIUrl":null,"url":null,"abstract":"During the COVID-19 pandemic, it has been a standard procedure for people all around the world to use Respiratory Protection Masks (RPM) that cover both the nose and the mouth. The Consequences of wearing RPMs, those pertaining to the perception and production of spoken communication, are rapidly becoming more prominent. Nevertheless, the utilization of face masks also causes attenuation in voice signals, and this alteration affects speech-processing technologies such as Automatic Speaker Verification (ASV) and speech-to-text conversion. An intervention by a deep learning-based algorithm is considered vital to remedy the issue of inappropriate exploitation of speaker-based technology. Therefore, in the proposed framework, a speaker identification system has been implemented to examine the effect of masks. First, the speech signals have been captured, pre-processed, and augmented by a variety of data augmentation techniques. Afterward, different “Mel-Frequency Cepstral Coefficients” (MFCC) features have been extracted to be fed into a “Long Short-Term Memory” (LSTM) for identifying speakers. The system’s overall performance has been assessed using accuracy, precision, recall, and Fl-score, which yields 93%, 93.3%, 92.2%, and 92.8%, respectively. The obtained results are still in a rudimentary phase, and they are subjected to further enhancements in the future by data expansion and exploitation of multiple optimization techniques.","PeriodicalId":337798,"journal":{"name":"2022 International Conference on Electrical and Computing Technologies and Applications (ICECTA)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Electrical and Computing Technologies and Applications (ICECTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECTA57148.2022.9990479","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
During the COVID-19 pandemic, it has been a standard procedure for people all around the world to use Respiratory Protection Masks (RPM) that cover both the nose and the mouth. The Consequences of wearing RPMs, those pertaining to the perception and production of spoken communication, are rapidly becoming more prominent. Nevertheless, the utilization of face masks also causes attenuation in voice signals, and this alteration affects speech-processing technologies such as Automatic Speaker Verification (ASV) and speech-to-text conversion. An intervention by a deep learning-based algorithm is considered vital to remedy the issue of inappropriate exploitation of speaker-based technology. Therefore, in the proposed framework, a speaker identification system has been implemented to examine the effect of masks. First, the speech signals have been captured, pre-processed, and augmented by a variety of data augmentation techniques. Afterward, different “Mel-Frequency Cepstral Coefficients” (MFCC) features have been extracted to be fed into a “Long Short-Term Memory” (LSTM) for identifying speakers. The system’s overall performance has been assessed using accuracy, precision, recall, and Fl-score, which yields 93%, 93.3%, 92.2%, and 92.8%, respectively. The obtained results are still in a rudimentary phase, and they are subjected to further enhancements in the future by data expansion and exploitation of multiple optimization techniques.