自闭症儿童在家庭干预中的语言表达水平评估

IF 4.5 2区计算机科学 Q1 COMPUTER SCIENCE, CYBERNETICS

IEEE Transactions on Computational Social Systems Pub Date : 2025-06-04 DOI:10.1109/TCSS.2025.3563733

Yueran Pan;Biyuan Chen;Wenxing Liu;Ming Cheng;Dong Zhang;Hongzhu Deng;Xiaobing Zou;Ming Li

{"title":"自闭症儿童在家庭干预中的语言表达水平评估","authors":"Yueran Pan;Biyuan Chen;Wenxing Liu;Ming Cheng;Dong Zhang;Hongzhu Deng;Xiaobing Zou;Ming Li","doi":"10.1109/TCSS.2025.3563733","DOIUrl":null,"url":null,"abstract":"The World Health Organization (WHO) has established the caregiver skill training (CST) program, designed to empower families with children diagnosed with autism spectrum disorder the essential caregiving skills. The joint engagement rating inventory (JERI) protocol evaluates participants’ engagement levels within the CST initiative. Traditionally, rating the expressive language level and use (EXLA) item in JERI relies on retrospective video analysis conducted by qualified professionals, thus incurring substantial labor costs. This study introduces a multimodal behavioral signal-processing framework designed to analyze both child and caregiver behaviors automatically, thereby rating EXLA. Initially, raw audio and video signals are segmented into concise intervals via voice activity detection, speaker diarization and speaker age classification, serving the dual purpose of eliminating nonspeech content and tagging each segment with its respective speaker. Subsequently, we extract an array of audio-visual features, encompassing our proposed interpretable, hand-crafted textual features, end-to-end audio embeddings and end-to-end video embeddings. Finally, these features are fused at the feature level to train a linear regression model aimed at predicting the EXLA scores. Our framework has been evaluated on the largest in-the-wild database currently available under the CST program. Experimental results indicate that the proposed system achieves a Pearson correlation coefficient of 0.768 against the expert ratings, evidencing promising performance comparable to that of human experts.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 5","pages":"3647-3659"},"PeriodicalIF":4.5000,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing the Expressive Language Levels of Autistic Children in Home Intervention\",\"authors\":\"Yueran Pan;Biyuan Chen;Wenxing Liu;Ming Cheng;Dong Zhang;Hongzhu Deng;Xiaobing Zou;Ming Li\",\"doi\":\"10.1109/TCSS.2025.3563733\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The World Health Organization (WHO) has established the caregiver skill training (CST) program, designed to empower families with children diagnosed with autism spectrum disorder the essential caregiving skills. The joint engagement rating inventory (JERI) protocol evaluates participants’ engagement levels within the CST initiative. Traditionally, rating the expressive language level and use (EXLA) item in JERI relies on retrospective video analysis conducted by qualified professionals, thus incurring substantial labor costs. This study introduces a multimodal behavioral signal-processing framework designed to analyze both child and caregiver behaviors automatically, thereby rating EXLA. Initially, raw audio and video signals are segmented into concise intervals via voice activity detection, speaker diarization and speaker age classification, serving the dual purpose of eliminating nonspeech content and tagging each segment with its respective speaker. Subsequently, we extract an array of audio-visual features, encompassing our proposed interpretable, hand-crafted textual features, end-to-end audio embeddings and end-to-end video embeddings. Finally, these features are fused at the feature level to train a linear regression model aimed at predicting the EXLA scores. Our framework has been evaluated on the largest in-the-wild database currently available under the CST program. Experimental results indicate that the proposed system achieves a Pearson correlation coefficient of 0.768 against the expert ratings, evidencing promising performance comparable to that of human experts.\",\"PeriodicalId\":13044,\"journal\":{\"name\":\"IEEE Transactions on Computational Social Systems\",\"volume\":\"12 5\",\"pages\":\"3647-3659\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2025-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computational Social Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11024030/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, CYBERNETICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Social Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11024030/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, CYBERNETICS","Score":null,"Total":0}

引用次数: 0

摘要

世界卫生组织（世卫组织）制定了护理人员技能培训方案，旨在使有诊断为自闭症谱系障碍儿童的家庭掌握基本的护理技能。联合参与评级清单（JERI）协议评估CST计划中参与者的参与水平。传统上，对JERI中表达性语言水平和使用（EXLA）项目的评分依赖于有资质的专业人员进行的回顾性视频分析，从而产生了大量的人工成本。本研究引入了一个多模态行为信号处理框架，旨在自动分析儿童和照顾者的行为，从而对EXLA进行评级。首先，原始音频和视频信号通过语音活动检测、说话人dialarization和说话人年龄分类被分割成简洁的间隔，达到消除非语音内容和用各自的说话人标记每个片段的双重目的。随后，我们提取了一系列视听特征，包括我们提出的可解释的、手工制作的文本特征、端到端音频嵌入和端到端视频嵌入。最后，在特征级将这些特征融合，以训练一个旨在预测EXLA分数的线性回归模型。我们的框架已经在CST项目下最大的野外数据库上进行了评估。实验结果表明，该系统与专家评分的Pearson相关系数为0.768，具有与人类专家相当的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Assessing the Expressive Language Levels of Autistic Children in Home Intervention

The World Health Organization (WHO) has established the caregiver skill training (CST) program, designed to empower families with children diagnosed with autism spectrum disorder the essential caregiving skills. The joint engagement rating inventory (JERI) protocol evaluates participants’ engagement levels within the CST initiative. Traditionally, rating the expressive language level and use (EXLA) item in JERI relies on retrospective video analysis conducted by qualified professionals, thus incurring substantial labor costs. This study introduces a multimodal behavioral signal-processing framework designed to analyze both child and caregiver behaviors automatically, thereby rating EXLA. Initially, raw audio and video signals are segmented into concise intervals via voice activity detection, speaker diarization and speaker age classification, serving the dual purpose of eliminating nonspeech content and tagging each segment with its respective speaker. Subsequently, we extract an array of audio-visual features, encompassing our proposed interpretable, hand-crafted textual features, end-to-end audio embeddings and end-to-end video embeddings. Finally, these features are fused at the feature level to train a linear regression model aimed at predicting the EXLA scores. Our framework has been evaluated on the largest in-the-wild database currently available under the CST program. Experimental results indicate that the proposed system achieves a Pearson correlation coefficient of 0.768 against the expert ratings, evidencing promising performance comparable to that of human experts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Computational Social Systems Social Sciences-Social Sciences (miscellaneous)

CiteScore

10.00

自引率

20.00%

发文量

316

期刊介绍： IEEE Transactions on Computational Social Systems focuses on such topics as modeling, simulation, analysis and understanding of social systems from the quantitative and/or computational perspective. "Systems" include man-man, man-machine and machine-machine organizations and adversarial situations as well as social media structures and their dynamics. More specifically, the proposed transactions publishes articles on modeling the dynamics of social systems, methodologies for incorporating and representing socio-cultural and behavioral aspects in computational modeling, analysis of social system behavior and structure, and paradigms for social systems modeling and simulation. The journal also features articles on social network dynamics, social intelligence and cognition, social systems design and architectures, socio-cultural modeling and representation, and computational behavior modeling, and their applications.