Process-based measures in high-stakes testing: practical implications for construct validity within military aviation selection.

IF 3.1 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Cognitive Research-Principles and Implications Pub Date : 2025-08-20 DOI:10.1186/s41235-025-00660-3

Joseph T Coyne, Laura Jamison, Kaylin Strong, Ciara Sibley, Cyrus Foroughi, Sarah Melick

{"title":"Process-based measures in high-stakes testing: practical implications for construct validity within military aviation selection.","authors":"Joseph T Coyne, Laura Jamison, Kaylin Strong, Ciara Sibley, Cyrus Foroughi, Sarah Melick","doi":"10.1186/s41235-025-00660-3","DOIUrl":null,"url":null,"abstract":"<p><p>This paper looks at how process-based spatial ability and attention measures taken within a high-stakes battery used to select pilots in the US Navy compare to lab-based measures of the same constructs. Process-based measures typically function by having individuals perform either a novel task or perform a task with novel stimuli. However, applicants often spend time practicing the tasks prior to taking the battery. A group of 307 Naval Flight Students participated in the study, in which they took several spatial ability, attention and general processing measures. One of the spatial tasks used in the study was the same as the spatial task in the Navy's pilot selection battery, which all of the participants had taken. All of the lab spatial ability measures including the one used in the selection battery were highly correlated and loaded onto the same spatial ability factor. However, the high-stakes spatial subtest was not correlated with any of the lab spatial measures including the same test administered in the lab. The lab spatial ability data was also correlated with training outcomes whereas the high-stakes process spatial and attention measures were not. The high-stakes attention measure was weakly correlated with some of the general processing measures. The pattern of results suggest that familiarity with the spatial and attention tasks in the high-stakes environment may be negating those tests ability to measure the constructs they were designed to measure, and also reducing their effectiveness to predict training performance. Statement of significance: This paper addresses an increasingly difficult challenge the Navy is facing within aviation selection, in that applicants are highly motivated and have access to unofficial replicas of the Navy's test battery. The challenge is specific to the process-based measures such as spatial ability and attention that rely on some degree of novelty to work. When applicants practice these types of tests they can practice to the test, memorize items, and learn strategies which impact the test's ability to measure the cognitive construct it was designed to measure as well as reduces its ability to predict flight training outcomes. This is particularly problematic as the unofficial test preparation software can replicate a new test within days. While the data presented here are limited to spatial ability and attention within military pilot selection it applies to a much broader community of researchers. Anyone developing a high-stakes test with a large and motivated applicant pool may also see their process-based measures perform differently in a high-stakes environment than a low stakes laboratory one in which participants are naïve to the tasks they are taking. The extent to which practice can alter the effectiveness of high-stakes test performance is an important one. The results of the paper suggest that test developers should assume participants are practiced and assess the extent to which practice on process-based measure impacts the tasks ability to measure the construct of interest and predict performance.</p>","PeriodicalId":46827,"journal":{"name":"Cognitive Research-Principles and Implications","volume":"10 1","pages":"51"},"PeriodicalIF":3.1000,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12364792/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Research-Principles and Implications","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1186/s41235-025-00660-3","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

This paper looks at how process-based spatial ability and attention measures taken within a high-stakes battery used to select pilots in the US Navy compare to lab-based measures of the same constructs. Process-based measures typically function by having individuals perform either a novel task or perform a task with novel stimuli. However, applicants often spend time practicing the tasks prior to taking the battery. A group of 307 Naval Flight Students participated in the study, in which they took several spatial ability, attention and general processing measures. One of the spatial tasks used in the study was the same as the spatial task in the Navy's pilot selection battery, which all of the participants had taken. All of the lab spatial ability measures including the one used in the selection battery were highly correlated and loaded onto the same spatial ability factor. However, the high-stakes spatial subtest was not correlated with any of the lab spatial measures including the same test administered in the lab. The lab spatial ability data was also correlated with training outcomes whereas the high-stakes process spatial and attention measures were not. The high-stakes attention measure was weakly correlated with some of the general processing measures. The pattern of results suggest that familiarity with the spatial and attention tasks in the high-stakes environment may be negating those tests ability to measure the constructs they were designed to measure, and also reducing their effectiveness to predict training performance. Statement of significance: This paper addresses an increasingly difficult challenge the Navy is facing within aviation selection, in that applicants are highly motivated and have access to unofficial replicas of the Navy's test battery. The challenge is specific to the process-based measures such as spatial ability and attention that rely on some degree of novelty to work. When applicants practice these types of tests they can practice to the test, memorize items, and learn strategies which impact the test's ability to measure the cognitive construct it was designed to measure as well as reduces its ability to predict flight training outcomes. This is particularly problematic as the unofficial test preparation software can replicate a new test within days. While the data presented here are limited to spatial ability and attention within military pilot selection it applies to a much broader community of researchers. Anyone developing a high-stakes test with a large and motivated applicant pool may also see their process-based measures perform differently in a high-stakes environment than a low stakes laboratory one in which participants are naïve to the tasks they are taking. The extent to which practice can alter the effectiveness of high-stakes test performance is an important one. The results of the paper suggest that test developers should assume participants are practiced and assess the extent to which practice on process-based measure impacts the tasks ability to measure the construct of interest and predict performance.

Abstract Image

查看原文本刊更多论文

高风险测试中基于过程的测量：军用航空选择中结构效度的实际意义。

这篇论文着眼于在高风险的电池中采用的基于过程的空间能力和注意力测量方法，用于选择美国海军的飞行员，与基于实验室的相同结构的测量方法进行比较。基于过程的测量通常通过让个体执行新任务或在新刺激下执行任务来发挥作用。然而，应试者通常会在测试前花时间练习。307名海军飞行专业学生参加了这项研究，对他们的空间能力、注意力和一般处理能力进行了测试。研究中使用的一个空间任务与所有参与者都参加过的海军飞行员选拔单元中的空间任务相同。包括选择单元在内的所有实验室空间能力测量都高度相关并加载到相同的空间能力因子上。然而，高风险空间子测试与任何实验室空间测量都不相关，包括在实验室进行的相同测试。实验室空间能力数据也与训练结果相关，而高风险过程空间和注意力测量则不相关。高风险注意测量与一些一般处理测量呈弱相关。结果表明，对高风险环境中空间和注意力任务的熟悉可能会使这些测试无法测量它们设计用于测量的构念，也会降低它们预测训练表现的有效性。意义说明：本文解决了海军在航空选择中面临的日益困难的挑战，因为申请人积极性很高，并且可以获得海军测试电池的非官方副本。挑战是特定于基于过程的措施，如空间能力和注意力，依赖于一定程度的新颖性来工作。当申请人练习这些类型的测试时，他们可以对测试进行练习，记忆项目，并学习影响测试测量认知结构的能力的策略，该测试旨在测量并降低其预测飞行训练结果的能力。这尤其成问题，因为非官方的测试准备软件可以在几天内复制一个新的测试。虽然这里提供的数据仅限于军事飞行员选择中的空间能力和注意力，但它适用于更广泛的研究人员群体。任何开发高风险测试的人，如果有大量积极的申请人，也可能会看到他们基于过程的测量在高风险环境中的表现与在低风险实验室中的表现不同，在低风险实验室中，参与者对他们所承担的任务naïve。实践能在多大程度上改变高风险考试表现的有效性是一个重要问题。本文的结果表明，测试开发者应该假设参与者是经过实践的，并评估基于过程的度量的实践在多大程度上影响了任务测量兴趣结构和预测性能的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊