{"title":"Leveraging Pre-Trained Acoustic Feature Extractor For Affective Vocal Bursts Tasks","authors":"Bagus Tris Atmaja, A. Sasou","doi":"10.23919/APSIPAASC55919.2022.9980083","DOIUrl":null,"url":null,"abstract":"Understanding humans' emotions is a challenge for computers. Nowadays, research on speech emotion recognition has been conducted progressively. Instead of a speech, affective information may lay on short vocal bursts (i.e., cry when sad). In this study, we evaluated a recent self-supervised learning model to extract acoustic embedding for affective vocal bursts tasks. There are four tasks investigated on both regression and classification problems. Using similar architectures, we found the effectiveness of using a pre-trained model over the baseline methods. The study is further expanded to evaluate the different number of seeds, patiences, and batch sizes on the performance of the four tasks.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"326 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/APSIPAASC55919.2022.9980083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Understanding humans' emotions is a challenge for computers. Nowadays, research on speech emotion recognition has been conducted progressively. Instead of a speech, affective information may lay on short vocal bursts (i.e., cry when sad). In this study, we evaluated a recent self-supervised learning model to extract acoustic embedding for affective vocal bursts tasks. There are four tasks investigated on both regression and classification problems. Using similar architectures, we found the effectiveness of using a pre-trained model over the baseline methods. The study is further expanded to evaluate the different number of seeds, patiences, and batch sizes on the performance of the four tasks.