美国标准样本中 NIH 工具箱执行功能任务评分算法的比较。

IF 3.3 2区心理学 Q1 PSYCHOLOGY, CLINICAL

Psychological Assessment Pub Date : 2024-12-01 DOI:10.1037/pas0001350

Yusuke Shono, Berivan Ece, Emily H Ho, Aaron J Kaat, Erica M LaForte, Ezgi Ayturk, Richard Gershon

{"title":"美国标准样本中 NIH 工具箱执行功能任务评分算法的比较。","authors":"Yusuke Shono, Berivan Ece, Emily H Ho, Aaron J Kaat, Erica M LaForte, Ezgi Ayturk, Richard Gershon","doi":"10.1037/pas0001350","DOIUrl":null,"url":null,"abstract":"Executive function (EF) has been extensively linked to various behavioral, clinical, and educational outcomes. There have been, however, few systematic investigations into how best to score EF tasks using speed and accuracy performance, particularly how to generate a summary and norm-referenced score. Using data from an updated norming study for the NIH Toolbox Version 3 (NIHTB V3) with the general U.S. population aged between 3 and 85 (N = 3,794; 52.3% female; Mage = 25.06, SDage = 22.92), we empirically evaluated and compared several scoring algorithms for two EF tests: The Dimensional Change Card Sort (a test of cognitive flexibility) and Flanker (a test of inhibitory control) Tests. Results showed that joint scoring algorithms integrating speed and accuracy into single scores (namely, rate-correct score, linear integrated speed-accuracy score, and speed-accuracy additive score) provided more robust psychometric evidence for the EF tests than single-index scores of accuracy and speed. These integrated speed-accuracy scores were consistent and stable within and across tasks and time; similar to that of another well-validated EF measure, but as predicted, not related to a crystallized intelligence measure score; and increased rapidly from early childhood through late adolescence/early adulthood and then declined toward late adulthood. The rate-correct score was particularly free from ceiling effects and sensitive to age-related changes and variability in EF performance. Among various scoring algorithms, we recommend rate-correct score, which served as the basis for generating new NIHTB V3 norm-referenced scores, with good test-retest reliability (Dimensional Change Card Sort = .77, Flanker = .81) and acceptable convergent and discriminant validity. (PsycInfo Database Record (c) 2024 APA, all rights reserved).","PeriodicalId":20770,"journal":{"name":"Psychological Assessment","volume":"36 12","pages":"760-771"},"PeriodicalIF":3.3000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11841212/pdf/","citationCount":"0","resultStr":"{\"title\":\"A comparison of scoring algorithms for the NIH Toolbox executive function tasks in a U.S. norming sample.\",\"authors\":\"Yusuke Shono, Berivan Ece, Emily H Ho, Aaron J Kaat, Erica M LaForte, Ezgi Ayturk, Richard Gershon\",\"doi\":\"10.1037/pas0001350\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Executive function (EF) has been extensively linked to various behavioral, clinical, and educational outcomes. There have been, however, few systematic investigations into how best to score EF tasks using speed and accuracy performance, particularly how to generate a summary and norm-referenced score. Using data from an updated norming study for the NIH Toolbox Version 3 (NIHTB V3) with the general U.S. population aged between 3 and 85 (N = 3,794; 52.3% female; Mage = 25.06, SDage = 22.92), we empirically evaluated and compared several scoring algorithms for two EF tests: The Dimensional Change Card Sort (a test of cognitive flexibility) and Flanker (a test of inhibitory control) Tests. Results showed that joint scoring algorithms integrating speed and accuracy into single scores (namely, rate-correct score, linear integrated speed-accuracy score, and speed-accuracy additive score) provided more robust psychometric evidence for the EF tests than single-index scores of accuracy and speed. These integrated speed-accuracy scores were consistent and stable within and across tasks and time; similar to that of another well-validated EF measure, but as predicted, not related to a crystallized intelligence measure score; and increased rapidly from early childhood through late adolescence/early adulthood and then declined toward late adulthood. The rate-correct score was particularly free from ceiling effects and sensitive to age-related changes and variability in EF performance. Among various scoring algorithms, we recommend rate-correct score, which served as the basis for generating new NIHTB V3 norm-referenced scores, with good test-retest reliability (Dimensional Change Card Sort = .77, Flanker = .81) and acceptable convergent and discriminant validity. (PsycInfo Database Record (c) 2024 APA, all rights reserved).\",\"PeriodicalId\":20770,\"journal\":{\"name\":\"Psychological Assessment\",\"volume\":\"36 12\",\"pages\":\"760-771\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2024-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11841212/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Psychological Assessment\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.1037/pas0001350\",\"RegionNum\":2,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHOLOGY, CLINICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychological Assessment","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1037/pas0001350","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, CLINICAL","Score":null,"Total":0}

引用次数: 0

摘要

执行功能（EF）与各种行为、临床和教育结果有着广泛的联系。然而，很少有系统的研究如何最好地使用速度和准确性表现来评分EF任务，特别是如何生成摘要和规范参考分数。使用美国国立卫生研究院工具箱版本3 （NIHTB V3）更新的规范研究数据，年龄在3至85岁之间的美国普通人群(N = 3,794；52.3%的女性;Mage = 25.06, SDage = 22.92)，我们对两个EF测试的几种评分算法进行了实证评估和比较：维度变化卡片排序（认知灵活性测试）和Flanker（抑制控制测试）测试。结果表明，将速度和准确性整合为单一得分（即正确率得分、速度-准确性线性综合得分和速度-准确性相加得分）的联合评分算法为EF测试提供了比准确性和速度单一指标得分更可靠的心理测量证据。这些综合的速度-准确性得分在任务和时间内和跨任务是一致和稳定的；与另一种有效的EF测量相似，但正如预测的那样，与结晶智力测量得分无关；从童年早期到青春期晚期/成年早期迅速增加，然后在成年后期下降。正确率评分尤其不受天花板效应的影响，对EF表现的年龄相关变化和可变性敏感。在各种评分算法中，我们推荐率正确率评分，作为生成新的NIHTB V3规范参考评分的基础，具有良好的重测信度（维度变化卡排序= .77,Flanker = .81）和可接受的收敛效度和判别效度。（PsycInfo Database Record (c) 2024 APA，版权所有）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A comparison of scoring algorithms for the NIH Toolbox executive function tasks in a U.S. norming sample.

Executive function (EF) has been extensively linked to various behavioral, clinical, and educational outcomes. There have been, however, few systematic investigations into how best to score EF tasks using speed and accuracy performance, particularly how to generate a summary and norm-referenced score. Using data from an updated norming study for the NIH Toolbox Version 3 (NIHTB V3) with the general U.S. population aged between 3 and 85 (N = 3,794; 52.3% female; Mage = 25.06, SDage = 22.92), we empirically evaluated and compared several scoring algorithms for two EF tests: The Dimensional Change Card Sort (a test of cognitive flexibility) and Flanker (a test of inhibitory control) Tests. Results showed that joint scoring algorithms integrating speed and accuracy into single scores (namely, rate-correct score, linear integrated speed-accuracy score, and speed-accuracy additive score) provided more robust psychometric evidence for the EF tests than single-index scores of accuracy and speed. These integrated speed-accuracy scores were consistent and stable within and across tasks and time; similar to that of another well-validated EF measure, but as predicted, not related to a crystallized intelligence measure score; and increased rapidly from early childhood through late adolescence/early adulthood and then declined toward late adulthood. The rate-correct score was particularly free from ceiling effects and sensitive to age-related changes and variability in EF performance. Among various scoring algorithms, we recommend rate-correct score, which served as the basis for generating new NIHTB V3 norm-referenced scores, with good test-retest reliability (Dimensional Change Card Sort = .77, Flanker = .81) and acceptable convergent and discriminant validity. (PsycInfo Database Record (c) 2024 APA, all rights reserved).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Psychological Assessment PSYCHOLOGY, CLINICAL-

CiteScore

5.70

自引率

5.60%

发文量

167

期刊介绍： Psychological Assessment is concerned mainly with empirical research on measurement and evaluation relevant to the broad field of clinical psychology. Submissions are welcome in the areas of assessment processes and methods. Included are - clinical judgment and the application of decision-making models - paradigms derived from basic psychological research in cognition, personality–social psychology, and biological psychology - development, validation, and application of assessment instruments, observational methods, and interviews