{"title":"Music Source Separation Using Generative Adversarial Network and U-Net","authors":"M. Satya, S. Suyanto","doi":"10.1109/ICoICT49345.2020.9166374","DOIUrl":null,"url":null,"abstract":"The separation of sound sources in the decomposition of music has become an interesting problem among scientists for the last 50 years. It has the main target of making it difficult for components in the music, such as vocals, bass, drums, and others. The results of sound separation have also been applied on many fields, such as remixing, repanning, and upmixing. In this paper, a new model based on a Generative Adversarial Network (GAN) is proposed to separate the music sources to rebuild the sound sources that exist in the music. The GAN architecture is built using U-net with VGG19 as an encoding block, mirror from VGG19 as an encoder block on the generator, and three times combinations of Convolution, Batch Normalization, and Leaky Rectified Linear Unit (LeakyReLU) blocks. An evaluation using the DSD100 dataset shows that the proposed model gives quite high average source to distortion ratios (SDR): 7.03 dB for bass, 18.72 dB for drums, 20.20 dB for vocal, and 12.73 dB for others.","PeriodicalId":113108,"journal":{"name":"2020 8th International Conference on Information and Communication Technology (ICoICT)","volume":"60 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 8th International Conference on Information and Communication Technology (ICoICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICoICT49345.2020.9166374","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The separation of sound sources in the decomposition of music has become an interesting problem among scientists for the last 50 years. It has the main target of making it difficult for components in the music, such as vocals, bass, drums, and others. The results of sound separation have also been applied on many fields, such as remixing, repanning, and upmixing. In this paper, a new model based on a Generative Adversarial Network (GAN) is proposed to separate the music sources to rebuild the sound sources that exist in the music. The GAN architecture is built using U-net with VGG19 as an encoding block, mirror from VGG19 as an encoder block on the generator, and three times combinations of Convolution, Batch Normalization, and Leaky Rectified Linear Unit (LeakyReLU) blocks. An evaluation using the DSD100 dataset shows that the proposed model gives quite high average source to distortion ratios (SDR): 7.03 dB for bass, 18.72 dB for drums, 20.20 dB for vocal, and 12.73 dB for others.