Ruvini Sanjeewa, Ravi Iyer, Pragalathan Apputhurai, Nilmini Wickramasinghe, Denny Meyer
{"title":"Machine Learning Approach to Identifying Empathy Using the Vocals of Mental Health Helpline Counselors: Algorithm Development and Validation.","authors":"Ruvini Sanjeewa, Ravi Iyer, Pragalathan Apputhurai, Nilmini Wickramasinghe, Denny Meyer","doi":"10.2196/67835","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>This research study aimed to detect the vocal features immersed in empathic counselor speech using samples of calls to a mental health helpline service.</p><p><strong>Objective: </strong>This study aimed to produce an algorithm for the identification of empathy from these features, which could act as a training guide for counselors and conversational agents who need to transmit empathy in their vocals.</p><p><strong>Methods: </strong>Two annotators with a psychology background and English heritage provided empathy ratings for 57 calls involving female counselors, as well as multiple short call segments within each of these calls. These ratings were found to be well-correlated between the 2 raters in a sample of 6 common calls. Using vocal feature extraction from call segments and statistical variable selection methods, such as L1 penalized LASSO (Least Absolute Shrinkage and Selection Operator) and forward selection, a total of 14 significant vocal features were associated with empathic speech. Generalized additive mixed models (GAMM), binary logistics regression with splines, and random forest models were used to obtain an algorithm that differentiated between high- and low-empathy call segments.</p><p><strong>Results: </strong>The binary logistics regression model reported higher predictive accuracies of empathy (area under the curve [AUC]=0.617, 95% CI 0.613-0.622) compared to the GAMM (AUC=0.605, 95% CI 0.601-0.609) and the random forest model (AUC=0.600, 95% CI 0.595-0.604). This difference was statistically significant, as evidenced by the nonoverlapping 95% CIs obtained for AUC. The DeLong test further validated these results, showing a significant difference in the binary logistic model compared to the random forest (D=6.443, df=186283, P<.001) and GAMM (Z=5.846, P<.001). These findings confirm that the binary logistic regression model outperforms the other 2 models concerning predictive accuracy for empathy classification.</p><p><strong>Conclusions: </strong>This study suggests that the identification of empathy from vocal features alone is challenging, and further research involving multimodal models (eg, models incorporating facial expression, words used, and vocal features) are encouraged for detecting empathy in the future. This study has several limitations, including a relatively small sample of calls and only 2 empathy raters. Future research should focus on accommodating multiple raters with varied backgrounds to explore these effects on perceptions of empathy. Additionally, considering counselor vocals from larger, more heterogeneous populations, including mixed-gender samples, will allow an exploration of the factors influencing the level of empathy projected in counselor voices more generally.</p>","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":"9 ","pages":"e67835"},"PeriodicalIF":2.0000,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12017608/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Formative Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/67835","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: This research study aimed to detect the vocal features immersed in empathic counselor speech using samples of calls to a mental health helpline service.
Objective: This study aimed to produce an algorithm for the identification of empathy from these features, which could act as a training guide for counselors and conversational agents who need to transmit empathy in their vocals.
Methods: Two annotators with a psychology background and English heritage provided empathy ratings for 57 calls involving female counselors, as well as multiple short call segments within each of these calls. These ratings were found to be well-correlated between the 2 raters in a sample of 6 common calls. Using vocal feature extraction from call segments and statistical variable selection methods, such as L1 penalized LASSO (Least Absolute Shrinkage and Selection Operator) and forward selection, a total of 14 significant vocal features were associated with empathic speech. Generalized additive mixed models (GAMM), binary logistics regression with splines, and random forest models were used to obtain an algorithm that differentiated between high- and low-empathy call segments.
Results: The binary logistics regression model reported higher predictive accuracies of empathy (area under the curve [AUC]=0.617, 95% CI 0.613-0.622) compared to the GAMM (AUC=0.605, 95% CI 0.601-0.609) and the random forest model (AUC=0.600, 95% CI 0.595-0.604). This difference was statistically significant, as evidenced by the nonoverlapping 95% CIs obtained for AUC. The DeLong test further validated these results, showing a significant difference in the binary logistic model compared to the random forest (D=6.443, df=186283, P<.001) and GAMM (Z=5.846, P<.001). These findings confirm that the binary logistic regression model outperforms the other 2 models concerning predictive accuracy for empathy classification.
Conclusions: This study suggests that the identification of empathy from vocal features alone is challenging, and further research involving multimodal models (eg, models incorporating facial expression, words used, and vocal features) are encouraged for detecting empathy in the future. This study has several limitations, including a relatively small sample of calls and only 2 empathy raters. Future research should focus on accommodating multiple raters with varied backgrounds to explore these effects on perceptions of empathy. Additionally, considering counselor vocals from larger, more heterogeneous populations, including mixed-gender samples, will allow an exploration of the factors influencing the level of empathy projected in counselor voices more generally.
背景:本研究旨在利用心理健康热线服务的电话样本,检测沉浸在共情咨询言语中的声音特征。目的:本研究旨在从这些特征中产生一种识别共情的算法,为需要在声音中传递共情的咨询师和会话代理提供培训指南。方法:两名具有心理学背景和英国血统的注释者对涉及女性咨询师的57个电话进行了共情评分,并在每个电话中提供了多个简短的电话片段。在6个常见呼叫的样本中,这些评级被发现在2个评级者之间具有良好的相关性。通过语音特征提取和统计变量选择方法,如L1惩罚LASSO(最小绝对收缩和选择算子)和前向选择,共有14个重要的语音特征与共情语音相关。采用广义加性混合模型(GAMM)、样条二元logistic回归和随机森林模型,获得了一种区分高同理心和低同理心呼叫段的算法。结果:二元logistic回归模型对共情的预测准确率(曲线下面积[AUC]=0.617, 95% CI 0.613-0.622)高于GAMM模型(AUC=0.605, 95% CI 0.601-0.609)和随机森林模型(AUC=0.600, 95% CI 0.595-0.604)。这一差异具有统计学意义,得到的AUC的95% ci不重叠证明了这一点。DeLong检验进一步验证了这些结果,表明二元逻辑模型与随机森林模型相比存在显著差异(D=6.443, df=186283, p)。结论:本研究表明,仅从声音特征中识别共情是具有挑战性的,未来可以进一步研究涉及多模态模型(例如,包含面部表情、使用的单词和声音特征的模型)来检测共情。这项研究有一些局限性,包括电话样本相对较小,只有2位同理心评分者。未来的研究应侧重于容纳不同背景的多重评价者,以探索这些对共情感知的影响。此外,考虑咨询师声音来自更大,更异质的人群,包括混合性别样本,将允许探索更普遍地影响咨询师声音中预测的共情水平的因素。