Harshit Harsh, Akhil Indraganti, S. Vanambathina, Bharat Siva Yaswanth Ramanam, V. S. Chandu, Hari Kishan Kondaveeti
{"title":"Convolutional GRU Networks based Singing Voice Separation","authors":"Harshit Harsh, Akhil Indraganti, S. Vanambathina, Bharat Siva Yaswanth Ramanam, V. S. Chandu, Hari Kishan Kondaveeti","doi":"10.1109/AISP53593.2022.9760616","DOIUrl":null,"url":null,"abstract":"Toned voice study is gaining importance due to advancement in the music industry. The breaking down of toned voice and its backtracking is similar to carrying images from the source domain to the target domain while preserving its content representation. For our case, the mixed voice prints were transformed into their constituent component. The drawback of U-Net convolutional architecture is that the learning rate may come down in the middle layers for deeper models, so there is some risk if the network learning is ignored in some cases where the abstract features are represented in those layers. In this work, we proclaim the methodology CGRUN for the task of singing voice division. It leads to a causal system that is naturally suitable for real-time processing applications. The speech processing application is the segregation of toned voices for voice mixing. Through software evaluation, this experiment confirms the use of CGRUN for toned voice separation. The technical term used for toned voice segregation and its backtracking is Music Information Retrieval (MIR).","PeriodicalId":6793,"journal":{"name":"2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP)","volume":"1 1","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AISP53593.2022.9760616","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Toned voice study is gaining importance due to advancement in the music industry. The breaking down of toned voice and its backtracking is similar to carrying images from the source domain to the target domain while preserving its content representation. For our case, the mixed voice prints were transformed into their constituent component. The drawback of U-Net convolutional architecture is that the learning rate may come down in the middle layers for deeper models, so there is some risk if the network learning is ignored in some cases where the abstract features are represented in those layers. In this work, we proclaim the methodology CGRUN for the task of singing voice division. It leads to a causal system that is naturally suitable for real-time processing applications. The speech processing application is the segregation of toned voices for voice mixing. Through software evaluation, this experiment confirms the use of CGRUN for toned voice separation. The technical term used for toned voice segregation and its backtracking is Music Information Retrieval (MIR).