Addressing subjectivity in paralinguistic data labeling for improved classification performance: A case study with Spanish-speaking Mexican children using data balancing and semi-supervised learning

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language Pub Date : 2024-05-01 DOI:10.1016/j.csl.2024.101652

Daniel Fajardo-Delgado , Isabel G. Vázquez-Gómez , Humberto Pérez-Espinosa

{"title":"Addressing subjectivity in paralinguistic data labeling for improved classification performance: A case study with Spanish-speaking Mexican children using data balancing and semi-supervised learning","authors":"Daniel Fajardo-Delgado , Isabel G. Vázquez-Gómez , Humberto Pérez-Espinosa","doi":"10.1016/j.csl.2024.101652","DOIUrl":null,"url":null,"abstract":"<div><p>Paralinguistics is an essential component of verbal communication, comprising elements that provide additional information to the language, such as emotional signals. However, the subjective nature of perceiving affective aspects, such as emotions, poses a significant challenge to the development of quality resources for training recognition models of paralinguistic features. Labelers may have different opinions and perceive different emotions from others, making it difficult to achieve a diverse and sufficient representation of considered categories. In this study, we focused on the automatic classification of paralinguistic aspects in Spanish-speaking Mexican children of elementary school age. However, the dataset presents a strong imbalance in all labeled aspects and a low agreement between the labelers. Furthermore, the audio samples were too short, making it challenging to accurately classify affective speech. To address these challenges, we propose a novel method that combines data balancing algorithms and semisupervised learning to improve the classification performance of the trained models. Our method aims to mitigate the subjectivity involved in labeling paralinguistic data, thus advancing the development of robust and accurate recognition models of affective aspects in speech.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"88 ","pages":"Article 101652"},"PeriodicalIF":3.1000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230824000354","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Paralinguistics is an essential component of verbal communication, comprising elements that provide additional information to the language, such as emotional signals. However, the subjective nature of perceiving affective aspects, such as emotions, poses a significant challenge to the development of quality resources for training recognition models of paralinguistic features. Labelers may have different opinions and perceive different emotions from others, making it difficult to achieve a diverse and sufficient representation of considered categories. In this study, we focused on the automatic classification of paralinguistic aspects in Spanish-speaking Mexican children of elementary school age. However, the dataset presents a strong imbalance in all labeled aspects and a low agreement between the labelers. Furthermore, the audio samples were too short, making it challenging to accurately classify affective speech. To address these challenges, we propose a novel method that combines data balancing algorithms and semisupervised learning to improve the classification performance of the trained models. Our method aims to mitigate the subjectivity involved in labeling paralinguistic data, thus advancing the development of robust and accurate recognition models of affective aspects in speech.

查看原文本刊更多论文

解决副语言数据标注中的主观性，提高分类性能：利用数据平衡和半监督学习对讲西班牙语的墨西哥儿童进行案例研究

副语言是语言交际的重要组成部分，包括为语言提供附加信息的元素，如情感信号。然而，对情绪等情感方面的感知具有主观性，这对开发用于训练副语言特征识别模型的优质资源构成了巨大挑战。标注者可能有不同的观点，感知到的情绪也与他人不同，因此很难实现对所考虑类别的多样化和充分的表征。在本研究中，我们重点研究了讲西班牙语的墨西哥小学年龄段儿童的副语言自动分类。然而，该数据集在所有标注方面都存在严重的不平衡性，且标注者之间的一致性较低。此外，由于音频样本太短，因此对情感语音进行准确分类具有挑战性。为了应对这些挑战，我们提出了一种结合数据平衡算法和半监督学习的新方法，以提高训练模型的分类性能。我们的方法旨在减轻标注副语言数据时的主观性，从而推动语音中情感方面稳健而准确的识别模型的发展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Speech and Language 工程技术-计算机：人工智能

CiteScore

11.30

自引率

4.70%

发文量

审稿时长

22.9 weeks

期刊介绍： Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.