Machine learning and deep learning systems for automated measurement of "advanced" theory of mind: Reliability and validity in children and adolescents.

IF 3.3 2区心理学 Q1 PSYCHOLOGY, CLINICAL

Psychological Assessment Pub Date : 2023-02-01 DOI:10.1037/pas0001186

Rory T Devine, Venelin Kovatchev, Imogen Grumley Traynor, Phillip Smith, Mark Lee

{"title":"Machine learning and deep learning systems for automated measurement of \"advanced\" theory of mind: Reliability and validity in children and adolescents.","authors":"Rory T Devine, Venelin Kovatchev, Imogen Grumley Traynor, Phillip Smith, Mark Lee","doi":"10.1037/pas0001186","DOIUrl":null,"url":null,"abstract":"<p><p>Understanding individual differences in theory of mind (ToM; the ability to attribute mental states to others) in middle childhood and adolescence hinges on the availability of robust and scalable measures. Open-ended response tasks yield valid indicators of ToM but are labor intensive and difficult to compare across studies. We examined the reliability and validity of new machine learning and deep learning neural network automated scoring systems for measuring ToM in children and adolescents. Two large samples of British children and adolescents aged between 7 and 13 years (Sample 1: N = 1,135, Mage = 10.22 years, SD = 1.45; Sample 2: N = 1,020, Mage = 10.36 years, SD = 1.27) completed the silent film and strange stories tasks. Teachers rated Sample 2 children's social competence with peers. A single latent-factor explained variation in performance on both the silent film and strange stories task (in Sample 1 and 2) and test performance was sensitive to age-related differences and individual differences within each age-group. A deep learning neural network automated scoring system trained on Sample 1 exhibited interrater reliability and measurement invariance with manual ratings in Sample 2. Validity of ratings from the automated scoring system was supported by unique positive associations between ToM and teacher-rated social competence. The results demonstrate that reliable and valid measures of ToM can be obtained using the new freely available deep learning neural network automated scoring system to rate open-ended text responses. (PsycInfo Database Record (c) 2023 APA, all rights reserved).</p>","PeriodicalId":20770,"journal":{"name":"Psychological Assessment","volume":"35 2","pages":"165-177"},"PeriodicalIF":3.3000,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychological Assessment","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1037/pas0001186","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, CLINICAL","Score":null,"Total":0}

引用次数: 2

Abstract

Understanding individual differences in theory of mind (ToM; the ability to attribute mental states to others) in middle childhood and adolescence hinges on the availability of robust and scalable measures. Open-ended response tasks yield valid indicators of ToM but are labor intensive and difficult to compare across studies. We examined the reliability and validity of new machine learning and deep learning neural network automated scoring systems for measuring ToM in children and adolescents. Two large samples of British children and adolescents aged between 7 and 13 years (Sample 1: N = 1,135, Mage = 10.22 years, SD = 1.45; Sample 2: N = 1,020, Mage = 10.36 years, SD = 1.27) completed the silent film and strange stories tasks. Teachers rated Sample 2 children's social competence with peers. A single latent-factor explained variation in performance on both the silent film and strange stories task (in Sample 1 and 2) and test performance was sensitive to age-related differences and individual differences within each age-group. A deep learning neural network automated scoring system trained on Sample 1 exhibited interrater reliability and measurement invariance with manual ratings in Sample 2. Validity of ratings from the automated scoring system was supported by unique positive associations between ToM and teacher-rated social competence. The results demonstrate that reliable and valid measures of ToM can be obtained using the new freely available deep learning neural network automated scoring system to rate open-ended text responses. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

查看原文本刊更多论文

用于“高级”心智理论自动测量的机器学习和深度学习系统:儿童和青少年的可靠性和有效性。

理解心理理论中的个体差异(ToM;在儿童中期和青少年时期，将心理状态归因于他人的能力取决于是否有可靠且可扩展的测量方法。开放式回答任务产生有效的ToM指标，但劳动密集型且难以在研究之间进行比较。我们研究了新的机器学习和深度学习神经网络自动评分系统用于测量儿童和青少年的ToM的可靠性和有效性。两组年龄在7 - 13岁之间的英国儿童和青少年(样本1:N = 1135, Mage = 10.22 years, SD = 1.45;样本2:N = 1020，年龄= 10.36,SD = 1.27)完成了默片和奇谈任务。教师评价样本2儿童与同伴的社交能力。一个单一的潜在因素解释了默片和奇怪故事任务(样本1和样本2)的表现差异，测试表现对年龄相关差异和每个年龄组的个体差异敏感。在样本1上训练的深度学习神经网络自动评分系统在样本2中表现出与人工评分的互估可靠性和测量不变性。自动评分系统的有效性得到了ToM和教师评定的社会能力之间独特的正相关的支持。结果表明，使用新的免费的深度学习神经网络自动评分系统对开放式文本响应进行评分，可以获得可靠有效的ToM度量。(PsycInfo数据库记录(c) 2023 APA，版权所有)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Psychological Assessment PSYCHOLOGY, CLINICAL-

CiteScore

5.70

自引率

5.60%

发文量

167

期刊介绍： Psychological Assessment is concerned mainly with empirical research on measurement and evaluation relevant to the broad field of clinical psychology. Submissions are welcome in the areas of assessment processes and methods. Included are - clinical judgment and the application of decision-making models - paradigms derived from basic psychological research in cognition, personality–social psychology, and biological psychology - development, validation, and application of assessment instruments, observational methods, and interviews