{"title":"Real Time Blind Audio Source Separation Based on Machine Learning Algorithms","authors":"A. Alghamdi, G. Healy, Hoda A. Abdelhafez","doi":"10.1109/NILES50944.2020.9257891","DOIUrl":null,"url":null,"abstract":"Machine learning algorithms, such as ConvTasNet and Demucs, can separate between two interfering signals like music and speech, without any prior information about the mixing operation. The Conv-TasNet algorithm is a fully convolutional time-domain audio separation network while Demucs algorithm is a new waveform-to-waveform model. The Demucs algorithm employs a technique similar to the audio generation model and has larger decoder capacity. The criteria for comparison of these algorithms include high-quality signal separation (no artefacts) and less delay in the execution time. This research examined both algorithms in four experiments: music and male, music and female, music and conversation and music and child. The results were evaluated based on mir_eval and R square, root mean square error (RMSE) and mean absolute Error (MAE) scores. Conv-TasNet had the highest SDR score for music in the music and female experiment, with a high SDR score for child experiment. The SDR value of music in the music and female experiment was high using the Demucs algorithm (7.8), while the child experiment had the highest SDR value (8.15). In terms of average execution time, Conv-TasNet was seven times faster than Demucs. RMSE and MAE were also used for measuring accuracy. RMSE indicates absolute values, and MAE computes the average magnitude of errors between observations and prediction data. Both algorithms showed excellent results and high accuracy in the separation process.","PeriodicalId":253090,"journal":{"name":"2020 2nd Novel Intelligent and Leading Emerging Sciences Conference (NILES)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 2nd Novel Intelligent and Leading Emerging Sciences Conference (NILES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NILES50944.2020.9257891","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Machine learning algorithms, such as ConvTasNet and Demucs, can separate between two interfering signals like music and speech, without any prior information about the mixing operation. The Conv-TasNet algorithm is a fully convolutional time-domain audio separation network while Demucs algorithm is a new waveform-to-waveform model. The Demucs algorithm employs a technique similar to the audio generation model and has larger decoder capacity. The criteria for comparison of these algorithms include high-quality signal separation (no artefacts) and less delay in the execution time. This research examined both algorithms in four experiments: music and male, music and female, music and conversation and music and child. The results were evaluated based on mir_eval and R square, root mean square error (RMSE) and mean absolute Error (MAE) scores. Conv-TasNet had the highest SDR score for music in the music and female experiment, with a high SDR score for child experiment. The SDR value of music in the music and female experiment was high using the Demucs algorithm (7.8), while the child experiment had the highest SDR value (8.15). In terms of average execution time, Conv-TasNet was seven times faster than Demucs. RMSE and MAE were also used for measuring accuracy. RMSE indicates absolute values, and MAE computes the average magnitude of errors between observations and prediction data. Both algorithms showed excellent results and high accuracy in the separation process.