{"title":"Many-to-Many Singing Performance Style Transfer on Pitch and Energy Contours","authors":"Yu-Teng Hsu;Jun-You Wang;Jyh-Shing Roger Jang","doi":"10.1109/LSP.2024.3506858","DOIUrl":null,"url":null,"abstract":"Singing voice conversion (SVC) aims to convert the singer identity of a singing voice to that of another singer. However, most existing SVC systems only perform the conversion of timbre information, while leaving other information unchanged. This approach does not consider other aspects of singer identity, particularly a singer's performance style, which is reflected in the pitch (F0) and the energy (volume dynamics) contours of singing. To address this issue, this paper proposes a many-to-many singing performance style transfer system that converts the pitch and energy contours of one singer's style to another singer's. To achieve this target, we utilize two AutoVC-like autoencoders with an information bottleneck to automatically disentangle performance style from other musical contents, one for the pitch contour while another for the energy contour. Experiment results suggested that the proposed model can perform singing performance style transfer in a many-to-many conversion scenario, resulting in improved singer identity similarity to the target singer.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"166-170"},"PeriodicalIF":3.2000,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10767407/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Singing voice conversion (SVC) aims to convert the singer identity of a singing voice to that of another singer. However, most existing SVC systems only perform the conversion of timbre information, while leaving other information unchanged. This approach does not consider other aspects of singer identity, particularly a singer's performance style, which is reflected in the pitch (F0) and the energy (volume dynamics) contours of singing. To address this issue, this paper proposes a many-to-many singing performance style transfer system that converts the pitch and energy contours of one singer's style to another singer's. To achieve this target, we utilize two AutoVC-like autoencoders with an information bottleneck to automatically disentangle performance style from other musical contents, one for the pitch contour while another for the energy contour. Experiment results suggested that the proposed model can perform singing performance style transfer in a many-to-many conversion scenario, resulting in improved singer identity similarity to the target singer.
期刊介绍:
The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.