N. Jamal, N. Fuad, Shahnoor Shanta, M. N. A. Sha'abani
{"title":"Monaural Speech Enhancement using Deep Neural Network with Cross-Speech Dataset","authors":"N. Jamal, N. Fuad, Shahnoor Shanta, M. N. A. Sha'abani","doi":"10.1109/ICSIPA52582.2021.9576789","DOIUrl":null,"url":null,"abstract":"Deep Neural Network (DNN)-based mask estimation approach is an emerging algorithm in monaural speech enhancement. It is used to enhance speech signals from the noisy background by calculating either speech or noise dominant in a particular frame of the noisy speech signal. It can construct complex models for nonlinear processing. However, the limitation of the DNN-based mask algorithm is a generalization of the targeted population. Past research works focused on their target dataset because of time consumption for the audio recording session. Thus, in this work, different recording conditions were used to study the performance of the DNN-based mask estimation approach. The findings revealed that different language test dataset, as well as different conditions, may not give large impact in speech enhancement performance since the algorithm only learn the noise information. But, the performance of speech enhancement is promising when the trained model has been designed properly, especially given the less sample variations in the input dataset involved during the training session.","PeriodicalId":326688,"journal":{"name":"2021 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSIPA52582.2021.9576789","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Deep Neural Network (DNN)-based mask estimation approach is an emerging algorithm in monaural speech enhancement. It is used to enhance speech signals from the noisy background by calculating either speech or noise dominant in a particular frame of the noisy speech signal. It can construct complex models for nonlinear processing. However, the limitation of the DNN-based mask algorithm is a generalization of the targeted population. Past research works focused on their target dataset because of time consumption for the audio recording session. Thus, in this work, different recording conditions were used to study the performance of the DNN-based mask estimation approach. The findings revealed that different language test dataset, as well as different conditions, may not give large impact in speech enhancement performance since the algorithm only learn the noise information. But, the performance of speech enhancement is promising when the trained model has been designed properly, especially given the less sample variations in the input dataset involved during the training session.