Rudranil Das, Deepti Deshwal, P. Sangwan, Neelam Nehra
{"title":"Music Source Separation: A Guide","authors":"Rudranil Das, Deepti Deshwal, P. Sangwan, Neelam Nehra","doi":"10.1109/ICIERA53202.2021.9726721","DOIUrl":null,"url":null,"abstract":"With the revolution in India's telecom industry and exponentially rising user base of applications like YouTube Shorts, Instagram reels, etc. and other indigenous apps like MX Takatak and Josh, insurgence in the quality of music is also needed. A captivating and a snappy background music piece form the foundation of all of the aforementioned apps, which most of the time is not readily available. Therefore, Music Source Separation (MSS) proves to be the need of the hour. MSS aims at segregating various constituting components of music with minimum possible overlap between them. These components (stems) include vocals, bass, drums and other accompaniments. Cocktail party effect illustrates MSS in the best way. The MSS problem can be eliminated using time domain based and spectrogram based methods. The purpose of this research is to look and compare the various existing deep learning-based algorithms (time domain based), such as Conv Tasnet, Demucs, and Open-Un-Mix. Also, we have implemented a very well-known convolutional architecture, Demucs, and were able to achieve the SDR of 7.2 evaluated on the MUSDB18 dataset.","PeriodicalId":220461,"journal":{"name":"2021 International Conference on Industrial Electronics Research and Applications (ICIERA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Industrial Electronics Research and Applications (ICIERA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIERA53202.2021.9726721","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the revolution in India's telecom industry and exponentially rising user base of applications like YouTube Shorts, Instagram reels, etc. and other indigenous apps like MX Takatak and Josh, insurgence in the quality of music is also needed. A captivating and a snappy background music piece form the foundation of all of the aforementioned apps, which most of the time is not readily available. Therefore, Music Source Separation (MSS) proves to be the need of the hour. MSS aims at segregating various constituting components of music with minimum possible overlap between them. These components (stems) include vocals, bass, drums and other accompaniments. Cocktail party effect illustrates MSS in the best way. The MSS problem can be eliminated using time domain based and spectrogram based methods. The purpose of this research is to look and compare the various existing deep learning-based algorithms (time domain based), such as Conv Tasnet, Demucs, and Open-Un-Mix. Also, we have implemented a very well-known convolutional architecture, Demucs, and were able to achieve the SDR of 7.2 evaluated on the MUSDB18 dataset.