Harshit Harsh, Akhil Indraganti, S. Vanambathina, Bharat Siva Yaswanth Ramanam, V. S. Chandu, Hari Kishan Kondaveeti
{"title":"基于卷积GRU网络的歌唱语音分离","authors":"Harshit Harsh, Akhil Indraganti, S. Vanambathina, Bharat Siva Yaswanth Ramanam, V. S. Chandu, Hari Kishan Kondaveeti","doi":"10.1109/AISP53593.2022.9760616","DOIUrl":null,"url":null,"abstract":"Toned voice study is gaining importance due to advancement in the music industry. The breaking down of toned voice and its backtracking is similar to carrying images from the source domain to the target domain while preserving its content representation. For our case, the mixed voice prints were transformed into their constituent component. The drawback of U-Net convolutional architecture is that the learning rate may come down in the middle layers for deeper models, so there is some risk if the network learning is ignored in some cases where the abstract features are represented in those layers. In this work, we proclaim the methodology CGRUN for the task of singing voice division. It leads to a causal system that is naturally suitable for real-time processing applications. The speech processing application is the segregation of toned voices for voice mixing. Through software evaluation, this experiment confirms the use of CGRUN for toned voice separation. The technical term used for toned voice segregation and its backtracking is Music Information Retrieval (MIR).","PeriodicalId":6793,"journal":{"name":"2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP)","volume":"1 1","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Convolutional GRU Networks based Singing Voice Separation\",\"authors\":\"Harshit Harsh, Akhil Indraganti, S. Vanambathina, Bharat Siva Yaswanth Ramanam, V. S. Chandu, Hari Kishan Kondaveeti\",\"doi\":\"10.1109/AISP53593.2022.9760616\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Toned voice study is gaining importance due to advancement in the music industry. The breaking down of toned voice and its backtracking is similar to carrying images from the source domain to the target domain while preserving its content representation. For our case, the mixed voice prints were transformed into their constituent component. The drawback of U-Net convolutional architecture is that the learning rate may come down in the middle layers for deeper models, so there is some risk if the network learning is ignored in some cases where the abstract features are represented in those layers. In this work, we proclaim the methodology CGRUN for the task of singing voice division. It leads to a causal system that is naturally suitable for real-time processing applications. The speech processing application is the segregation of toned voices for voice mixing. Through software evaluation, this experiment confirms the use of CGRUN for toned voice separation. The technical term used for toned voice segregation and its backtracking is Music Information Retrieval (MIR).\",\"PeriodicalId\":6793,\"journal\":{\"name\":\"2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP)\",\"volume\":\"1 1\",\"pages\":\"1-5\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AISP53593.2022.9760616\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AISP53593.2022.9760616","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Convolutional GRU Networks based Singing Voice Separation
Toned voice study is gaining importance due to advancement in the music industry. The breaking down of toned voice and its backtracking is similar to carrying images from the source domain to the target domain while preserving its content representation. For our case, the mixed voice prints were transformed into their constituent component. The drawback of U-Net convolutional architecture is that the learning rate may come down in the middle layers for deeper models, so there is some risk if the network learning is ignored in some cases where the abstract features are represented in those layers. In this work, we proclaim the methodology CGRUN for the task of singing voice division. It leads to a causal system that is naturally suitable for real-time processing applications. The speech processing application is the segregation of toned voices for voice mixing. Through software evaluation, this experiment confirms the use of CGRUN for toned voice separation. The technical term used for toned voice segregation and its backtracking is Music Information Retrieval (MIR).