Yu-Tao Chang, Yuan-Hong Yang, Yu-Huai Peng, Syu-Siang Wang, T. Chi, Yu Tsao, Hsin-Min Wang
{"title":"基于稀疏门控机制的混合专家语音转换系统","authors":"Yu-Tao Chang, Yuan-Hong Yang, Yu-Huai Peng, Syu-Siang Wang, T. Chi, Yu Tsao, Hsin-Min Wang","doi":"10.1109/ISCSLP49672.2021.9362072","DOIUrl":null,"url":null,"abstract":"Owing to the recent advancements in deep learning technology, the performance of voice conversion (VC) in terms of quality and similarity has significantly improved. However, complex computation is generally required for deep-learning-based VC systems. This can cause a notable latency, which limits the deployment of such VC systems in real-world applications. Therefore, increasing the efficiency of online computing has become an important task. In this study, we propose a novel mixture-of-experts (MoE) based VC system, termed MoEVC. The MoEVC system uses a gating mechanism to assign weights to feature maps to increase VC performance. In addition, applying sparse constraints on the gating mechanism can skip some convolution processes through elimination of redundant feature maps, thereby accelerating online computing. Experimental results show that by using proper sparse constraints, we can effectively reduce the FLOPs (floating-point operations) count by 70%, while improving VC performance in both objective evaluation and human subjective listening tests.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"MoEVC: A Mixture of Experts Voice Conversion System With Sparse Gating Mechanism for Online Computation Acceleration\",\"authors\":\"Yu-Tao Chang, Yuan-Hong Yang, Yu-Huai Peng, Syu-Siang Wang, T. Chi, Yu Tsao, Hsin-Min Wang\",\"doi\":\"10.1109/ISCSLP49672.2021.9362072\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Owing to the recent advancements in deep learning technology, the performance of voice conversion (VC) in terms of quality and similarity has significantly improved. However, complex computation is generally required for deep-learning-based VC systems. This can cause a notable latency, which limits the deployment of such VC systems in real-world applications. Therefore, increasing the efficiency of online computing has become an important task. In this study, we propose a novel mixture-of-experts (MoE) based VC system, termed MoEVC. The MoEVC system uses a gating mechanism to assign weights to feature maps to increase VC performance. In addition, applying sparse constraints on the gating mechanism can skip some convolution processes through elimination of redundant feature maps, thereby accelerating online computing. Experimental results show that by using proper sparse constraints, we can effectively reduce the FLOPs (floating-point operations) count by 70%, while improving VC performance in both objective evaluation and human subjective listening tests.\",\"PeriodicalId\":279828,\"journal\":{\"name\":\"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCSLP49672.2021.9362072\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP49672.2021.9362072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
MoEVC: A Mixture of Experts Voice Conversion System With Sparse Gating Mechanism for Online Computation Acceleration
Owing to the recent advancements in deep learning technology, the performance of voice conversion (VC) in terms of quality and similarity has significantly improved. However, complex computation is generally required for deep-learning-based VC systems. This can cause a notable latency, which limits the deployment of such VC systems in real-world applications. Therefore, increasing the efficiency of online computing has become an important task. In this study, we propose a novel mixture-of-experts (MoE) based VC system, termed MoEVC. The MoEVC system uses a gating mechanism to assign weights to feature maps to increase VC performance. In addition, applying sparse constraints on the gating mechanism can skip some convolution processes through elimination of redundant feature maps, thereby accelerating online computing. Experimental results show that by using proper sparse constraints, we can effectively reduce the FLOPs (floating-point operations) count by 70%, while improving VC performance in both objective evaluation and human subjective listening tests.