Ryosuke Tanabe, Yuto Ichikawa, Takanori Fujisawa, M. Ikehara
{"title":"Music Source Separation with Generative Adversarial Network and Waveform Averaging","authors":"Ryosuke Tanabe, Yuto Ichikawa, Takanori Fujisawa, M. Ikehara","doi":"10.1109/IEEECONF44664.2019.9048852","DOIUrl":null,"url":null,"abstract":"The task of music source separation is to extract a target sound from mixed sound. A popular approach for this task uses a DNN which learns the relationship of the spectrum of mixed sound and one of separated sound. However, many DNN algorithms does not consider the clearness of the output sound, this tends to produce artifact in the output spectrum. We adopt a generative adversarial network (GAN) to improve the clearness of the separated sound. In addition, we propose data augmentation by pitch-shift. The performance of DNN strongly depends on the quantity of the dataset for train. In other words, the limited kinds of the training datasets gives poor knowledge for the unknown sound sources. Learning the pitch-shifted signal can compensate the kinds of training set and makes the network robust to estimate the sound spectrum with various pitches. Furthermore, we process the pitch-shifted signals and average them to reduce artifacts. This proposal is based on the idea that network once learned can also separate pitch-shifted sound sources not only original one. Compared with the conventional method, our method achieves to obtain well-separated signal with smaller artifacts.","PeriodicalId":6684,"journal":{"name":"2019 53rd Asilomar Conference on Signals, Systems, and Computers","volume":"41 1","pages":"1796-1800"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 53rd Asilomar Conference on Signals, Systems, and Computers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IEEECONF44664.2019.9048852","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The task of music source separation is to extract a target sound from mixed sound. A popular approach for this task uses a DNN which learns the relationship of the spectrum of mixed sound and one of separated sound. However, many DNN algorithms does not consider the clearness of the output sound, this tends to produce artifact in the output spectrum. We adopt a generative adversarial network (GAN) to improve the clearness of the separated sound. In addition, we propose data augmentation by pitch-shift. The performance of DNN strongly depends on the quantity of the dataset for train. In other words, the limited kinds of the training datasets gives poor knowledge for the unknown sound sources. Learning the pitch-shifted signal can compensate the kinds of training set and makes the network robust to estimate the sound spectrum with various pitches. Furthermore, we process the pitch-shifted signals and average them to reduce artifacts. This proposal is based on the idea that network once learned can also separate pitch-shifted sound sources not only original one. Compared with the conventional method, our method achieves to obtain well-separated signal with smaller artifacts.