Using Keystroke Behavior Patterns to Detect Nonauthentic Texts in Writing Assessments: Evaluating the Fairness of Predictive Models

IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED
Yang Jiang, Mo Zhang, Jiangang Hao, Paul Deane, Chen Li
{"title":"Using Keystroke Behavior Patterns to Detect Nonauthentic Texts in Writing Assessments: Evaluating the Fairness of Predictive Models","authors":"Yang Jiang,&nbsp;Mo Zhang,&nbsp;Jiangang Hao,&nbsp;Paul Deane,&nbsp;Chen Li","doi":"10.1111/jedm.12416","DOIUrl":null,"url":null,"abstract":"<p>The emergence of sophisticated AI tools such as ChatGPT, coupled with the transition to remote delivery of educational assessments in the COVID-19 era, has led to increasing concerns about academic integrity and test security. Using AI tools, test takers can produce high-quality texts effortlessly and use them to game assessments. It is thus critical to detect these nonauthentic texts to ensure test integrity. In this study, we leveraged keystroke logs—recordings of every keypress—to build machine learning (ML) detectors of nonauthentic texts in a large-scale writing assessment. We focused on investigating the fairness of the detectors across demographic subgroups to ensure that nongenuine writing can be predicted equally well across subgroups. Results indicated that keystroke dynamics were effective in identifying nonauthentic texts. While the ML models were slightly more likely to misclassify the original responses submitted by male test takers as consisting of nonauthentic texts than those submitted by females, the effect sizes were negligible. Furthermore, balancing demographic distributions and class labels did not consistently mitigate detector bias across predictive models. Findings of this study not only provide implications for using behavioral data to address test security issues, but also highlight the importance of evaluating the fairness of predictive models in educational contexts.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 4","pages":"571-594"},"PeriodicalIF":1.4000,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Educational Measurement","FirstCategoryId":"102","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jedm.12416","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PSYCHOLOGY, APPLIED","Score":null,"Total":0}
引用次数: 0

Abstract

The emergence of sophisticated AI tools such as ChatGPT, coupled with the transition to remote delivery of educational assessments in the COVID-19 era, has led to increasing concerns about academic integrity and test security. Using AI tools, test takers can produce high-quality texts effortlessly and use them to game assessments. It is thus critical to detect these nonauthentic texts to ensure test integrity. In this study, we leveraged keystroke logs—recordings of every keypress—to build machine learning (ML) detectors of nonauthentic texts in a large-scale writing assessment. We focused on investigating the fairness of the detectors across demographic subgroups to ensure that nongenuine writing can be predicted equally well across subgroups. Results indicated that keystroke dynamics were effective in identifying nonauthentic texts. While the ML models were slightly more likely to misclassify the original responses submitted by male test takers as consisting of nonauthentic texts than those submitted by females, the effect sizes were negligible. Furthermore, balancing demographic distributions and class labels did not consistently mitigate detector bias across predictive models. Findings of this study not only provide implications for using behavioral data to address test security issues, but also highlight the importance of evaluating the fairness of predictive models in educational contexts.

求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.30
自引率
7.70%
发文量
46
期刊介绍: The Journal of Educational Measurement (JEM) publishes original measurement research, provides reviews of measurement publications, and reports on innovative measurement applications. The topics addressed will interest those concerned with the practice of measurement in field settings, as well as be of interest to measurement theorists. In addition to presenting new contributions to measurement theory and practice, JEM also serves as a vehicle for improving educational measurement applications in a variety of settings.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信