{"title":"基于混合CNN-LSTM的人脸识别框架","authors":"Mohamed Bader, I. Shahin, A. Ahmed, N. Werghi","doi":"10.1109/ICECTA57148.2022.9990138","DOIUrl":null,"url":null,"abstract":"Following the declaration of COVID-19 as a worldwide pandemic, hindering a multitude number of lives, face mask exploitation has become extremely crucial to barricade the emanation of the virus. The masks available in the market are of various sorts and materials and tend to affect the speaker’s vocal characteristics. As a result, optimum communication may be hampered. In the proposed framework, a speaker identification model has been employed. Also, the speech corpus has been captured. Then, the spectrograms were obtained and passed through a two-stage pre-processing. The first stage includes the audio samples. In contrast, the second stage has targeted the spectrograms. Afterward, the generated spectrograms were passed into a hybrid Convolutional Neural Network- Long Short-Term Memory (CNN-LSTM) model to perform the classification. Our proposed framework has shown its capability to identify speakers while they are wearing face masks. Moreover, the system has been evaluated on the collected dataset, where it has attained 92.7%, 92.62%, 87.71%, and 88.26% in terms of accuracy, precision, recall, and F1-score, respectively. The acquired findings are still preliminary and will be refined further in the future by data expansion and the employment of numerous optimization approaches.","PeriodicalId":337798,"journal":{"name":"2022 International Conference on Electrical and Computing Technologies and Applications (ICECTA)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Hybrid CNN-LSTM Speaker Identification Framework for Evaluating the Impact of Face Masks\",\"authors\":\"Mohamed Bader, I. Shahin, A. Ahmed, N. Werghi\",\"doi\":\"10.1109/ICECTA57148.2022.9990138\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Following the declaration of COVID-19 as a worldwide pandemic, hindering a multitude number of lives, face mask exploitation has become extremely crucial to barricade the emanation of the virus. The masks available in the market are of various sorts and materials and tend to affect the speaker’s vocal characteristics. As a result, optimum communication may be hampered. In the proposed framework, a speaker identification model has been employed. Also, the speech corpus has been captured. Then, the spectrograms were obtained and passed through a two-stage pre-processing. The first stage includes the audio samples. In contrast, the second stage has targeted the spectrograms. Afterward, the generated spectrograms were passed into a hybrid Convolutional Neural Network- Long Short-Term Memory (CNN-LSTM) model to perform the classification. Our proposed framework has shown its capability to identify speakers while they are wearing face masks. Moreover, the system has been evaluated on the collected dataset, where it has attained 92.7%, 92.62%, 87.71%, and 88.26% in terms of accuracy, precision, recall, and F1-score, respectively. The acquired findings are still preliminary and will be refined further in the future by data expansion and the employment of numerous optimization approaches.\",\"PeriodicalId\":337798,\"journal\":{\"name\":\"2022 International Conference on Electrical and Computing Technologies and Applications (ICECTA)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Electrical and Computing Technologies and Applications (ICECTA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICECTA57148.2022.9990138\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Electrical and Computing Technologies and Applications (ICECTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECTA57148.2022.9990138","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hybrid CNN-LSTM Speaker Identification Framework for Evaluating the Impact of Face Masks
Following the declaration of COVID-19 as a worldwide pandemic, hindering a multitude number of lives, face mask exploitation has become extremely crucial to barricade the emanation of the virus. The masks available in the market are of various sorts and materials and tend to affect the speaker’s vocal characteristics. As a result, optimum communication may be hampered. In the proposed framework, a speaker identification model has been employed. Also, the speech corpus has been captured. Then, the spectrograms were obtained and passed through a two-stage pre-processing. The first stage includes the audio samples. In contrast, the second stage has targeted the spectrograms. Afterward, the generated spectrograms were passed into a hybrid Convolutional Neural Network- Long Short-Term Memory (CNN-LSTM) model to perform the classification. Our proposed framework has shown its capability to identify speakers while they are wearing face masks. Moreover, the system has been evaluated on the collected dataset, where it has attained 92.7%, 92.62%, 87.71%, and 88.26% in terms of accuracy, precision, recall, and F1-score, respectively. The acquired findings are still preliminary and will be refined further in the future by data expansion and the employment of numerous optimization approaches.