{"title":"情绪性文本到语音的非言语发声建模与迁移","authors":"Haitong Zhang, Xinyuan Yu, Yue Lin","doi":"10.1109/ICASSP49357.2023.10096033","DOIUrl":null,"url":null,"abstract":"This paper addresses the problem of non-speech vocalization (NSV) modeling and transfer in emotional TTS. We propose an emotion TTS system (NSV-TTS) to model NSV and emotional speech. The model utilizes self-supervised learning to extract unsupervised linguistic units (ULUs) for NSV labeling and zero-shot NSV transfer. Furthermore, we propose token mixing and random masking to boost the performance. We evaluate the proposed method on various NSV types and emotion classes. The experimental results reveal that the proposed method performs well in the zero-shot NSV transfer task. Lastly, we conduct ablation studies to investigate the proposed method further.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"NSV-TTS: Non-Speech Vocalization Modeling And Transfer In Emotional Text-To-Speech\",\"authors\":\"Haitong Zhang, Xinyuan Yu, Yue Lin\",\"doi\":\"10.1109/ICASSP49357.2023.10096033\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper addresses the problem of non-speech vocalization (NSV) modeling and transfer in emotional TTS. We propose an emotion TTS system (NSV-TTS) to model NSV and emotional speech. The model utilizes self-supervised learning to extract unsupervised linguistic units (ULUs) for NSV labeling and zero-shot NSV transfer. Furthermore, we propose token mixing and random masking to boost the performance. We evaluate the proposed method on various NSV types and emotion classes. The experimental results reveal that the proposed method performs well in the zero-shot NSV transfer task. Lastly, we conduct ablation studies to investigate the proposed method further.\",\"PeriodicalId\":113072,\"journal\":{\"name\":\"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP49357.2023.10096033\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP49357.2023.10096033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
NSV-TTS: Non-Speech Vocalization Modeling And Transfer In Emotional Text-To-Speech
This paper addresses the problem of non-speech vocalization (NSV) modeling and transfer in emotional TTS. We propose an emotion TTS system (NSV-TTS) to model NSV and emotional speech. The model utilizes self-supervised learning to extract unsupervised linguistic units (ULUs) for NSV labeling and zero-shot NSV transfer. Furthermore, we propose token mixing and random masking to boost the performance. We evaluate the proposed method on various NSV types and emotion classes. The experimental results reveal that the proposed method performs well in the zero-shot NSV transfer task. Lastly, we conduct ablation studies to investigate the proposed method further.