Journal of Educational Measurement最新文献

筛选
英文 中文
Validity Arguments for AI-Based Automated Scores: Essay Scoring as an Illustration 基于人工智能的自动评分的有效性论证:以论文评分为例
IF 1.3 4区 心理学
Journal of Educational Measurement Pub Date : 2022-06-08 DOI: 10.1111/jedm.12333
Steve Ferrara, Saed Qunbar
{"title":"Validity Arguments for AI-Based Automated Scores: Essay Scoring as an Illustration","authors":"Steve Ferrara,&nbsp;Saed Qunbar","doi":"10.1111/jedm.12333","DOIUrl":"10.1111/jedm.12333","url":null,"abstract":"<p>In this article, we argue that automated scoring engines should be transparent and construct relevant—that is, as much as is currently feasible. Many current automated scoring engines cannot achieve high degrees of scoring accuracy without allowing in some features that may not be easily explained and understood and may not be obviously and directly relevant to the target assessment construct. We address the current limitations on evidence and validity arguments for scores from automated scoring engines from the points of view of the Standards for Educational and Psychological Testing (i.e., construct relevance, construct representation, and fairness) and emerging principles in Artificial Intelligence (e.g., explainable AI, an examinee's right to explanations, and principled AI). We illustrate these concepts and arguments for automated essay scores.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"59 3","pages":"288-313"},"PeriodicalIF":1.3,"publicationDate":"2022-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48147561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Psychometric Methods to Evaluate Measurement and Algorithmic Bias in Automated Scoring 评估自动评分测量和算法偏差的心理测量学方法
IF 1.3 4区 心理学
Journal of Educational Measurement Pub Date : 2022-06-01 DOI: 10.1111/jedm.12335
Matthew S. Johnson, Xiang Liu, Daniel F. McCaffrey
{"title":"Psychometric Methods to Evaluate Measurement and Algorithmic Bias in Automated Scoring","authors":"Matthew S. Johnson,&nbsp;Xiang Liu,&nbsp;Daniel F. McCaffrey","doi":"10.1111/jedm.12335","DOIUrl":"10.1111/jedm.12335","url":null,"abstract":"<p>With the increasing use of automated scores in operational testing settings comes the need to understand the ways in which they can yield biased and unfair results. In this paper, we provide a brief survey of some of the ways in which the predictive methods used in automated scoring can lead to biased, and thus unfair automated scores. After providing definitions of fairness from machine learning and a psychometric framework to study them, we demonstrate how modeling decisions, like omitting variables, using proxy measures or confounded variables, and even the optimization criterion in estimation can lead to biased and unfair automated scores. We then introduce two simple methods for evaluating bias, evaluate their statistical properties through simulation, and apply to an item from a large-scale reading assessment.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"59 3","pages":"338-361"},"PeriodicalIF":1.3,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49138205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Toward Argument-Based Fairness with an Application to AI-Enhanced Educational Assessments 基于论证的公平性及其在人工智能增强教育评估中的应用
IF 1.3 4区 心理学
Journal of Educational Measurement Pub Date : 2022-06-01 DOI: 10.1111/jedm.12334
A. Corinne Huggins-Manley, Brandon M. Booth, Sidney K. D'Mello
{"title":"Toward Argument-Based Fairness with an Application to AI-Enhanced Educational Assessments","authors":"A. Corinne Huggins-Manley,&nbsp;Brandon M. Booth,&nbsp;Sidney K. D'Mello","doi":"10.1111/jedm.12334","DOIUrl":"10.1111/jedm.12334","url":null,"abstract":"<p>The field of educational measurement places validity and fairness as central concepts of assessment quality. Prior research has proposed embedding fairness arguments within argument-based validity processes, particularly when fairness is conceived as comparability in assessment properties across groups. However, we argue that a more flexible approach to fairness arguments that occurs outside of and complementary to validity arguments is required to address many of the views on fairness that a set of assessment stakeholders may hold. Accordingly, we focus this manuscript on two contributions: (a) introducing the argument-based fairness approach to complement argument-based validity for both traditional and artificial intelligence (AI)-enhanced assessments and (b) applying it in an illustrative AI assessment of perceived hireability in automated video interviews used to prescreen job candidates. We conclude with recommendations for further advancing argument-based fairness approaches.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"59 3","pages":"362-388"},"PeriodicalIF":1.3,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45199321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Linking and Comparability across Conditions of Measurement: Established Frameworks and Proposed Updates 测量条件之间的联系和可比性:已建立的框架和建议的更新
IF 1.3 4区 心理学
Journal of Educational Measurement Pub Date : 2022-05-30 DOI: 10.1111/jedm.12322
Tim Moses
{"title":"Linking and Comparability across Conditions of Measurement: Established Frameworks and Proposed Updates","authors":"Tim Moses","doi":"10.1111/jedm.12322","DOIUrl":"10.1111/jedm.12322","url":null,"abstract":"<p>One result of recent changes in testing is that previously established linking frameworks may not adequately address challenges in current linking situations. Test linking through equating, concordance, vertical scaling or battery scaling may not represent linkings for the scores of tests developed to measure constructs differently for different examinees, or tests that are administered in different modes and data collection designs. This article considers how previously proposed linking frameworks might be updated to address more recent testing situations. The first section summarizes the definitions and frameworks described in previous test linking discussions. Additional sections consider some sources of more disparate approaches to test development and administrations, as well as the implications of these for test linking. Possibilities for reflecting these features in an expanded test linking framework are proposed that encourage limited comparability, such as comparability that is restricted to subgroups or to the conditions of a linking study when a linking is produced, or within, but not across tests or test forms when an empirical linking based on examinee data is not produced. The implications of an updated framework of previously established linking approaches are further described in a final discussion.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"59 2","pages":"231-250"},"PeriodicalIF":1.3,"publicationDate":"2022-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46845951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Introduction to the Special Issue Maintaining Score Comparability: Recent Challenges and Some Possible Solutions 保持分数可比性特刊导论:最近的挑战和一些可能的解决方案
IF 1.3 4区 心理学
Journal of Educational Measurement Pub Date : 2022-05-26 DOI: 10.1111/jedm.12323
Tim Moses, Gautam Puhan
{"title":"Introduction to the Special Issue Maintaining Score Comparability: Recent Challenges and Some Possible Solutions","authors":"Tim Moses,&nbsp;Gautam Puhan","doi":"10.1111/jedm.12323","DOIUrl":"10.1111/jedm.12323","url":null,"abstract":"","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"59 2","pages":"137-139"},"PeriodicalIF":1.3,"publicationDate":"2022-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49471640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Anchoring Validity Evidence for Automated Essay Scoring 锚定有效性证据的自动作文评分
IF 1.3 4区 心理学
Journal of Educational Measurement Pub Date : 2022-05-15 DOI: 10.1111/jedm.12336
Mark D. Shermis
{"title":"Anchoring Validity Evidence for Automated Essay Scoring","authors":"Mark D. Shermis","doi":"10.1111/jedm.12336","DOIUrl":"10.1111/jedm.12336","url":null,"abstract":"<p>One of the challenges of discussing validity arguments for machine scoring of essays centers on the absence of a commonly held definition and theory of good writing. At best, the algorithms attempt to measure select attributes of writing and calibrate them against human ratings with the goal of accurate prediction of scores for new essays. Sometimes these attributes are based on the fundamentals of writing (e.g., fluency), but quite often they are based on locally developed rubrics that may be confounded with specific content coverage expectations. This lack of transparency makes it difficult to provide systematic evidence that machine scoring is assessing writing, but slices or correlates of writing performance.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"59 3","pages":"314-337"},"PeriodicalIF":1.3,"publicationDate":"2022-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46704579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Historical Perspectives on Score Comparability Issues Raised by Innovations in Testing 从历史的角度看考试创新引起的分数可比性问题
IF 1.3 4区 心理学
Journal of Educational Measurement Pub Date : 2022-05-11 DOI: 10.1111/jedm.12318
Peter Baldwin, Brian E. Clauser
{"title":"Historical Perspectives on Score Comparability Issues Raised by Innovations in Testing","authors":"Peter Baldwin,&nbsp;Brian E. Clauser","doi":"10.1111/jedm.12318","DOIUrl":"10.1111/jedm.12318","url":null,"abstract":"<p>While score comparability across test forms typically relies on common (or randomly equivalent) examinees or items, innovations in item formats, test delivery, and efforts to extend the range of score interpretation may require a special data collection before examinees or items can be used in this way—or may be incompatible with common examinee or item designs altogether. When comparisons are necessary under these nonroutine conditions, forms still must be connected by <i>something</i> and this article focuses on these form-invariant connective <i>somethings</i>. A conceptual framework for thinking about the problem of score comparability in this way is given followed by a description of three classes of connectives. Examples from the history of innovations in testing are given for each class.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"59 2","pages":"140-160"},"PeriodicalIF":1.3,"publicationDate":"2022-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44486843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Recent Challenges to Maintaining Score Comparability: A Commentary 保持分数可比性的最新挑战:评论
IF 1.3 4区 心理学
Journal of Educational Measurement Pub Date : 2022-05-10 DOI: 10.1111/jedm.12319
Neil J. Dorans, Shelby J. Haberman
{"title":"Recent Challenges to Maintaining Score Comparability: A Commentary","authors":"Neil J. Dorans,&nbsp;Shelby J. Haberman","doi":"10.1111/jedm.12319","DOIUrl":"10.1111/jedm.12319","url":null,"abstract":"","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"59 2","pages":"251-264"},"PeriodicalIF":1.3,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45896082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Validating Performance Standards via Latent Class Analysis 通过潜在类分析验证性能标准
IF 1.3 4区 心理学
Journal of Educational Measurement Pub Date : 2022-05-05 DOI: 10.1111/jedm.12325
Salih Binici, Ismail Cuhadar
{"title":"Validating Performance Standards via Latent Class Analysis","authors":"Salih Binici,&nbsp;Ismail Cuhadar","doi":"10.1111/jedm.12325","DOIUrl":"10.1111/jedm.12325","url":null,"abstract":"<p>Validity of performance standards is a key element for the defensibility of standard setting results, and validating performance standards requires collecting multiple pieces of evidence at every step during the standard setting process. This study employs a statistical procedure, latent class analysis, to set performance standards and compares latent class analysis results with previously established performance standards via the modified-Angoff method for cross-validation. The context of the study is an operational large-scale science assessment administered in one of the southern states in the United States. Results show that the number of classes that emerged in the latent class analysis concurs with the number of existing performance levels. In addition, there is a substantial level of agreement between latent class analysis results and modified-Angoff method in terms of classifying students into the same performance levels. Overall, the findings establish evidence for the validity of the performance standards identified via the modified-Angoff method. Practical implications of the study findings are discussed.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"59 4","pages":"502-516"},"PeriodicalIF":1.3,"publicationDate":"2022-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43539035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Score Comparability Issues with At-Home Testing and How to Address Them 评分可比性问题与在家测试和如何解决他们
IF 1.3 4区 心理学
Journal of Educational Measurement Pub Date : 2022-05-04 DOI: 10.1111/jedm.12324
Gautam Puhan, Sooyeon Kim
{"title":"Score Comparability Issues with At-Home Testing and How to Address Them","authors":"Gautam Puhan,&nbsp;Sooyeon Kim","doi":"10.1111/jedm.12324","DOIUrl":"10.1111/jedm.12324","url":null,"abstract":"<p>As a result of the COVID-19 pandemic, at-home testing has become a popular delivery mode in many testing programs. When programs offer at-home testing to expand their service, the score comparability between test takers testing remotely and those testing in a test center is critical. This article summarizes statistical procedures that could be used to evaluate potential mode effects at both the item level and the total score levels. Using operational data from a licensure test, we also compared linking relationships between the test center and at-home testing groups to determine the reporting score conversion from a subpopulation invariance perspective.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"59 2","pages":"161-179"},"PeriodicalIF":1.3,"publicationDate":"2022-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43479468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信