Joseph T Coyne, Laura Jamison, Kaylin Strong, Ciara Sibley, Cyrus Foroughi, Sarah Melick
{"title":"Process-based measures in high-stakes testing: practical implications for construct validity within military aviation selection.","authors":"Joseph T Coyne, Laura Jamison, Kaylin Strong, Ciara Sibley, Cyrus Foroughi, Sarah Melick","doi":"10.1186/s41235-025-00660-3","DOIUrl":null,"url":null,"abstract":"<p><p>This paper looks at how process-based spatial ability and attention measures taken within a high-stakes battery used to select pilots in the US Navy compare to lab-based measures of the same constructs. Process-based measures typically function by having individuals perform either a novel task or perform a task with novel stimuli. However, applicants often spend time practicing the tasks prior to taking the battery. A group of 307 Naval Flight Students participated in the study, in which they took several spatial ability, attention and general processing measures. One of the spatial tasks used in the study was the same as the spatial task in the Navy's pilot selection battery, which all of the participants had taken. All of the lab spatial ability measures including the one used in the selection battery were highly correlated and loaded onto the same spatial ability factor. However, the high-stakes spatial subtest was not correlated with any of the lab spatial measures including the same test administered in the lab. The lab spatial ability data was also correlated with training outcomes whereas the high-stakes process spatial and attention measures were not. The high-stakes attention measure was weakly correlated with some of the general processing measures. The pattern of results suggest that familiarity with the spatial and attention tasks in the high-stakes environment may be negating those tests ability to measure the constructs they were designed to measure, and also reducing their effectiveness to predict training performance. Statement of significance: This paper addresses an increasingly difficult challenge the Navy is facing within aviation selection, in that applicants are highly motivated and have access to unofficial replicas of the Navy's test battery. The challenge is specific to the process-based measures such as spatial ability and attention that rely on some degree of novelty to work. When applicants practice these types of tests they can practice to the test, memorize items, and learn strategies which impact the test's ability to measure the cognitive construct it was designed to measure as well as reduces its ability to predict flight training outcomes. This is particularly problematic as the unofficial test preparation software can replicate a new test within days. While the data presented here are limited to spatial ability and attention within military pilot selection it applies to a much broader community of researchers. Anyone developing a high-stakes test with a large and motivated applicant pool may also see their process-based measures perform differently in a high-stakes environment than a low stakes laboratory one in which participants are naïve to the tasks they are taking. The extent to which practice can alter the effectiveness of high-stakes test performance is an important one. The results of the paper suggest that test developers should assume participants are practiced and assess the extent to which practice on process-based measure impacts the tasks ability to measure the construct of interest and predict performance.</p>","PeriodicalId":46827,"journal":{"name":"Cognitive Research-Principles and Implications","volume":"10 1","pages":"51"},"PeriodicalIF":3.1000,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12364792/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Research-Principles and Implications","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1186/s41235-025-00660-3","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
This paper looks at how process-based spatial ability and attention measures taken within a high-stakes battery used to select pilots in the US Navy compare to lab-based measures of the same constructs. Process-based measures typically function by having individuals perform either a novel task or perform a task with novel stimuli. However, applicants often spend time practicing the tasks prior to taking the battery. A group of 307 Naval Flight Students participated in the study, in which they took several spatial ability, attention and general processing measures. One of the spatial tasks used in the study was the same as the spatial task in the Navy's pilot selection battery, which all of the participants had taken. All of the lab spatial ability measures including the one used in the selection battery were highly correlated and loaded onto the same spatial ability factor. However, the high-stakes spatial subtest was not correlated with any of the lab spatial measures including the same test administered in the lab. The lab spatial ability data was also correlated with training outcomes whereas the high-stakes process spatial and attention measures were not. The high-stakes attention measure was weakly correlated with some of the general processing measures. The pattern of results suggest that familiarity with the spatial and attention tasks in the high-stakes environment may be negating those tests ability to measure the constructs they were designed to measure, and also reducing their effectiveness to predict training performance. Statement of significance: This paper addresses an increasingly difficult challenge the Navy is facing within aviation selection, in that applicants are highly motivated and have access to unofficial replicas of the Navy's test battery. The challenge is specific to the process-based measures such as spatial ability and attention that rely on some degree of novelty to work. When applicants practice these types of tests they can practice to the test, memorize items, and learn strategies which impact the test's ability to measure the cognitive construct it was designed to measure as well as reduces its ability to predict flight training outcomes. This is particularly problematic as the unofficial test preparation software can replicate a new test within days. While the data presented here are limited to spatial ability and attention within military pilot selection it applies to a much broader community of researchers. Anyone developing a high-stakes test with a large and motivated applicant pool may also see their process-based measures perform differently in a high-stakes environment than a low stakes laboratory one in which participants are naïve to the tasks they are taking. The extent to which practice can alter the effectiveness of high-stakes test performance is an important one. The results of the paper suggest that test developers should assume participants are practiced and assess the extent to which practice on process-based measure impacts the tasks ability to measure the construct of interest and predict performance.