{"title":"Semi-autonomous data enrichment based on cross-task labelling of missing targets for holistic speech analysis","authors":"Yue Zhang, Yuxiang Zhou, Jie Shen, Björn Schuller","doi":"10.1109/ICASSP.2016.7472847","DOIUrl":null,"url":null,"abstract":"In this work, we propose a novel approach for large-scale data enrichment, with the aim to address a major shortcoming of current research in computational paralinguistics, namely, looking at speaker attributes in isolation although strong interdependencies between them exist. The scarcity of multi-target databases, in which instances are labelled for different kinds of speaker characteristics, compounds this problem. The core idea of our work is to join existing data resources into one single holistic database with a multi-dimensional label space by using semi-supervised learning techniques to predict missing labels. In the proposed new Cross-Task Labelling (CTL) method, a model is first trained on the labelled training set of the selected databases for each individual task. Then, the trained classifiers are used for the crosslabelling of databases among each other. To exemplify the effectiveness of the `CTL' method, we evaluated it for likability, personality, and emotion recognition as representative tasks from the INTERSPEECH Computational Paralinguistics ChallengE (ComParE) series. The results show that `CTL' lays the foundation for holistic speech analysis by semi-autonomously annotating the existing databases, and expanding the multi-target label space at the same time, while achieving higher accuracy as the baseline performance of the challenges.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"139 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2016.7472847","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
In this work, we propose a novel approach for large-scale data enrichment, with the aim to address a major shortcoming of current research in computational paralinguistics, namely, looking at speaker attributes in isolation although strong interdependencies between them exist. The scarcity of multi-target databases, in which instances are labelled for different kinds of speaker characteristics, compounds this problem. The core idea of our work is to join existing data resources into one single holistic database with a multi-dimensional label space by using semi-supervised learning techniques to predict missing labels. In the proposed new Cross-Task Labelling (CTL) method, a model is first trained on the labelled training set of the selected databases for each individual task. Then, the trained classifiers are used for the crosslabelling of databases among each other. To exemplify the effectiveness of the `CTL' method, we evaluated it for likability, personality, and emotion recognition as representative tasks from the INTERSPEECH Computational Paralinguistics ChallengE (ComParE) series. The results show that `CTL' lays the foundation for holistic speech analysis by semi-autonomously annotating the existing databases, and expanding the multi-target label space at the same time, while achieving higher accuracy as the baseline performance of the challenges.