{"title":"Detecting Test-Taking Engagement in Changing Test Contexts","authors":"Blair Lehman, Jesse R. Sparks, Jonathan Steinberg","doi":"10.1002/ets2.12384","DOIUrl":null,"url":null,"abstract":"<p>Over the last 20 years, many methods have been proposed to use process data (e.g., response time) to detect changes in engagement during the test-taking process. However, many of these methods were developed and evaluated in highly similar testing contexts: 30 or more single-select multiple-choice items presented in a linear, fixed sequence in which an item must be answered before progressing to the next item. However, this testing context becomes less and less representative of testing contexts in general as the affordances of technology are leveraged to provide more diverse and innovative testing experiences. The 2019 National Assessment of Educational Progress (NAEP) mathematics administration for grades 8 and 12 testing context represents an example use case that differed significantly from assessments that were typically used in previous research on test-taking engagement (e.g., number of items, item format, navigation). Thus, we leveraged this use case to re-evaluate the utility of an existing engagement detection method: normative threshold method. We decomposed the normative threshold method to evaluate its alignment with this use case and then evaluated 25 variations of this threshold-setting method with previously established evaluation criteria. Our findings revealed that this critical analysis of the threshold-setting method's alignment with the NAEP testing context could be used to identify the most appropriate variation of this method for this use case. We discuss the broader implications for engagement detection as testing contexts continue to evolve.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2024 1","pages":"1-15"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12384","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ETS Research Report Series","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ets2.12384","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 0
Abstract
Over the last 20 years, many methods have been proposed to use process data (e.g., response time) to detect changes in engagement during the test-taking process. However, many of these methods were developed and evaluated in highly similar testing contexts: 30 or more single-select multiple-choice items presented in a linear, fixed sequence in which an item must be answered before progressing to the next item. However, this testing context becomes less and less representative of testing contexts in general as the affordances of technology are leveraged to provide more diverse and innovative testing experiences. The 2019 National Assessment of Educational Progress (NAEP) mathematics administration for grades 8 and 12 testing context represents an example use case that differed significantly from assessments that were typically used in previous research on test-taking engagement (e.g., number of items, item format, navigation). Thus, we leveraged this use case to re-evaluate the utility of an existing engagement detection method: normative threshold method. We decomposed the normative threshold method to evaluate its alignment with this use case and then evaluated 25 variations of this threshold-setting method with previously established evaluation criteria. Our findings revealed that this critical analysis of the threshold-setting method's alignment with the NAEP testing context could be used to identify the most appropriate variation of this method for this use case. We discuss the broader implications for engagement detection as testing contexts continue to evolve.