{"title":"用于构音障碍语音自动分类的对抗性自由说话人身份不变表示学习","authors":"Parvaneh Janbakhshi, I. Kodrasi","doi":"10.21437/interspeech.2022-402","DOIUrl":null,"url":null,"abstract":"Speech representations which are robust to pathology-unrelated cues such as speaker identity information have been shown to be advantageous for automatic dysarthric speech classification. A recently proposed technique to learn speaker identity-invariant representations for dysarthric speech classification is based on adversarial training. However, adversarial training can be challenging, unstable, and sensitive to training parameters. To avoid adversarial training, in this paper we propose to learn speaker-identity invariant representations exploiting a feature separation framework relying on mutual information minimization. Experimental results on a database of neurotypical and dysarthric speech show that the proposed adversarial-free framework successfully learns speaker identity-invariant representations. Further, it is shown that such representations result in a similar dysarthric speech classification performance as the representations obtained using adversarial training, while the training procedure is more stable and less sensitive to training parameters.","PeriodicalId":73500,"journal":{"name":"Interspeech","volume":"1 1","pages":"2138-2142"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adversarial-Free Speaker Identity-Invariant Representation Learning for Automatic Dysarthric Speech Classification\",\"authors\":\"Parvaneh Janbakhshi, I. Kodrasi\",\"doi\":\"10.21437/interspeech.2022-402\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech representations which are robust to pathology-unrelated cues such as speaker identity information have been shown to be advantageous for automatic dysarthric speech classification. A recently proposed technique to learn speaker identity-invariant representations for dysarthric speech classification is based on adversarial training. However, adversarial training can be challenging, unstable, and sensitive to training parameters. To avoid adversarial training, in this paper we propose to learn speaker-identity invariant representations exploiting a feature separation framework relying on mutual information minimization. Experimental results on a database of neurotypical and dysarthric speech show that the proposed adversarial-free framework successfully learns speaker identity-invariant representations. Further, it is shown that such representations result in a similar dysarthric speech classification performance as the representations obtained using adversarial training, while the training procedure is more stable and less sensitive to training parameters.\",\"PeriodicalId\":73500,\"journal\":{\"name\":\"Interspeech\",\"volume\":\"1 1\",\"pages\":\"2138-2142\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Interspeech\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/interspeech.2022-402\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interspeech","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/interspeech.2022-402","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Adversarial-Free Speaker Identity-Invariant Representation Learning for Automatic Dysarthric Speech Classification
Speech representations which are robust to pathology-unrelated cues such as speaker identity information have been shown to be advantageous for automatic dysarthric speech classification. A recently proposed technique to learn speaker identity-invariant representations for dysarthric speech classification is based on adversarial training. However, adversarial training can be challenging, unstable, and sensitive to training parameters. To avoid adversarial training, in this paper we propose to learn speaker-identity invariant representations exploiting a feature separation framework relying on mutual information minimization. Experimental results on a database of neurotypical and dysarthric speech show that the proposed adversarial-free framework successfully learns speaker identity-invariant representations. Further, it is shown that such representations result in a similar dysarthric speech classification performance as the representations obtained using adversarial training, while the training procedure is more stable and less sensitive to training parameters.