Journal of Educational Measurement最新文献

筛选
英文 中文
Anchoring Validity Evidence for Automated Essay Scoring 锚定有效性证据的自动作文评分
IF 1.3 4区 心理学
Journal of Educational Measurement Pub Date : 2022-05-15 DOI: 10.1111/jedm.12336
Mark D. Shermis
{"title":"Anchoring Validity Evidence for Automated Essay Scoring","authors":"Mark D. Shermis","doi":"10.1111/jedm.12336","DOIUrl":"10.1111/jedm.12336","url":null,"abstract":"<p>One of the challenges of discussing validity arguments for machine scoring of essays centers on the absence of a commonly held definition and theory of good writing. At best, the algorithms attempt to measure select attributes of writing and calibrate them against human ratings with the goal of accurate prediction of scores for new essays. Sometimes these attributes are based on the fundamentals of writing (e.g., fluency), but quite often they are based on locally developed rubrics that may be confounded with specific content coverage expectations. This lack of transparency makes it difficult to provide systematic evidence that machine scoring is assessing writing, but slices or correlates of writing performance.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46704579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Historical Perspectives on Score Comparability Issues Raised by Innovations in Testing 从历史的角度看考试创新引起的分数可比性问题
IF 1.3 4区 心理学
Journal of Educational Measurement Pub Date : 2022-05-11 DOI: 10.1111/jedm.12318
Peter Baldwin, Brian E. Clauser
{"title":"Historical Perspectives on Score Comparability Issues Raised by Innovations in Testing","authors":"Peter Baldwin,&nbsp;Brian E. Clauser","doi":"10.1111/jedm.12318","DOIUrl":"10.1111/jedm.12318","url":null,"abstract":"<p>While score comparability across test forms typically relies on common (or randomly equivalent) examinees or items, innovations in item formats, test delivery, and efforts to extend the range of score interpretation may require a special data collection before examinees or items can be used in this way—or may be incompatible with common examinee or item designs altogether. When comparisons are necessary under these nonroutine conditions, forms still must be connected by <i>something</i> and this article focuses on these form-invariant connective <i>somethings</i>. A conceptual framework for thinking about the problem of score comparability in this way is given followed by a description of three classes of connectives. Examples from the history of innovations in testing are given for each class.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44486843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Recent Challenges to Maintaining Score Comparability: A Commentary 保持分数可比性的最新挑战:评论
IF 1.3 4区 心理学
Journal of Educational Measurement Pub Date : 2022-05-10 DOI: 10.1111/jedm.12319
Neil J. Dorans, Shelby J. Haberman
{"title":"Recent Challenges to Maintaining Score Comparability: A Commentary","authors":"Neil J. Dorans,&nbsp;Shelby J. Haberman","doi":"10.1111/jedm.12319","DOIUrl":"10.1111/jedm.12319","url":null,"abstract":"","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45896082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Validating Performance Standards via Latent Class Analysis 通过潜在类分析验证性能标准
IF 1.3 4区 心理学
Journal of Educational Measurement Pub Date : 2022-05-05 DOI: 10.1111/jedm.12325
Salih Binici, Ismail Cuhadar
{"title":"Validating Performance Standards via Latent Class Analysis","authors":"Salih Binici,&nbsp;Ismail Cuhadar","doi":"10.1111/jedm.12325","DOIUrl":"10.1111/jedm.12325","url":null,"abstract":"<p>Validity of performance standards is a key element for the defensibility of standard setting results, and validating performance standards requires collecting multiple pieces of evidence at every step during the standard setting process. This study employs a statistical procedure, latent class analysis, to set performance standards and compares latent class analysis results with previously established performance standards via the modified-Angoff method for cross-validation. The context of the study is an operational large-scale science assessment administered in one of the southern states in the United States. Results show that the number of classes that emerged in the latent class analysis concurs with the number of existing performance levels. In addition, there is a substantial level of agreement between latent class analysis results and modified-Angoff method in terms of classifying students into the same performance levels. Overall, the findings establish evidence for the validity of the performance standards identified via the modified-Angoff method. Practical implications of the study findings are discussed.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43539035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Score Comparability Issues with At-Home Testing and How to Address Them 评分可比性问题与在家测试和如何解决他们
IF 1.3 4区 心理学
Journal of Educational Measurement Pub Date : 2022-05-04 DOI: 10.1111/jedm.12324
Gautam Puhan, Sooyeon Kim
{"title":"Score Comparability Issues with At-Home Testing and How to Address Them","authors":"Gautam Puhan,&nbsp;Sooyeon Kim","doi":"10.1111/jedm.12324","DOIUrl":"10.1111/jedm.12324","url":null,"abstract":"<p>As a result of the COVID-19 pandemic, at-home testing has become a popular delivery mode in many testing programs. When programs offer at-home testing to expand their service, the score comparability between test takers testing remotely and those testing in a test center is critical. This article summarizes statistical procedures that could be used to evaluate potential mode effects at both the item level and the total score levels. Using operational data from a licensure test, we also compared linking relationships between the test center and at-home testing groups to determine the reporting score conversion from a subpopulation invariance perspective.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43479468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The Impact of Cheating on Score Comparability via Pool-Based IRT Pre-equating 作弊对分数可比性的影响——基于池的IRT预均衡
IF 1.3 4区 心理学
Journal of Educational Measurement Pub Date : 2022-05-01 DOI: 10.1111/jedm.12321
Jinghua Liu, Kirk Becker
{"title":"The Impact of Cheating on Score Comparability via Pool-Based IRT Pre-equating","authors":"Jinghua Liu,&nbsp;Kirk Becker","doi":"10.1111/jedm.12321","DOIUrl":"10.1111/jedm.12321","url":null,"abstract":"<p>For any testing programs that administer multiple forms across multiple years, maintaining score comparability via equating is essential. With continuous testing and high-stakes results, especially with less secure online administrations, testing programs must consider the potential for cheating on their exams. This study used empirical and simulated data to examine the impact of item exposure and prior knowledge on the estimation of item difficulty and test taker's ability via pool-based IRT preequating. Raw-to-theta transformations were derived from two groups of test takers with and without possible prior knowledge of exposed items, and these were compared to a criterion raw to theta transformation. Results indicated that item exposure has a large impact on item difficulty, not only altering the difficulty of exposed items, but also altering the difficulty of unexposed items. Item exposure makes test takers with prior knowledge appear more able. Further, theta estimation bias for test takers without prior knowledge increases when more test takers with possible prior knowledge are in the calibration population. Score inflation occurs for test takers with and without prior knowledge, especially for those with lower abilities.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46066972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Score Comparability between Online Proctored and In-Person Credentialing Exams 在线监考和现场考试之间的分数可比性
IF 1.3 4区 心理学
Journal of Educational Measurement Pub Date : 2022-04-27 DOI: 10.1111/jedm.12320
Paul Jones, Ye Tong, Jinghua Liu, Joshua Borglum, Vince Primoli
{"title":"Score Comparability between Online Proctored and In-Person Credentialing Exams","authors":"Paul Jones,&nbsp;Ye Tong,&nbsp;Jinghua Liu,&nbsp;Joshua Borglum,&nbsp;Vince Primoli","doi":"10.1111/jedm.12320","DOIUrl":"10.1111/jedm.12320","url":null,"abstract":"<p>This article studied two methods to detect mode effects in two credentialing exams. In Study 1, we used a “modal scale comparison approach,” where the same pool of items was calibrated separately, without transformation, within two TC cohorts (TC1 and TC2) and one OP cohort (OP1) matched on their pool-based scale score distributions. The calibrations from all three groups were used to score the TC2 cohort, designated the validation sample. The TC1 item parameters and TC1-based thetas and pass rates were more like the native TC2 values than the OP1-based values, indicating mode effects, but the score and pass/fail decision differences were small. In Study 2, we used a “cross-modal repeater approach” in which test takers who failed their first attempt in one modality took the test again in either the same or different modality. The two pairs of repeater groups (TC → TC: TC → OP, and OP → OP: OP → TC) were matched exactly on their first attempt scores. Results showed increased pass rate and greater score variability in all conditions involving OP, with mode effects noticeable in both the TC → OP condition and less-strongly in the OP → TC condition. Limitations of the study and implications for exam developers were discussed.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43064453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Random Responders in the TIMSS 2015 Student Questionnaire: A Threat to Validity? TIMSS 2015学生问卷中的随机应答者:对效度的威胁?
IF 1.3 4区 心理学
Journal of Educational Measurement Pub Date : 2022-04-26 DOI: 10.1111/jedm.12317
Saskia van Laar, Johan Braeken
{"title":"Random Responders in the TIMSS 2015 Student Questionnaire: A Threat to Validity?","authors":"Saskia van Laar,&nbsp;Johan Braeken","doi":"10.1111/jedm.12317","DOIUrl":"https://doi.org/10.1111/jedm.12317","url":null,"abstract":"<p>The low-stakes character of international large-scale educational assessments implies that a participating student might at times provide unrelated answers as if s/he was not even reading the items and choosing a response option randomly throughout. Depending on the severity of this invalid response behavior, interpretations of the assessment results are at risk of being invalidated. Not much is known about the prevalence nor impact of such <i>random responders</i> in the context of international large-scale educational assessments. Following a mixture item response theory (IRT) approach, an initial investigation of both issues is conducted for the Confidence in and Value of Mathematics/Science (VoM/VoS) scales in the Trends in International Mathematics and Science Study (TIMSS) 2015 student questionnaire. We end with a call to facilitate further mapping of invalid response behavior in this context by the inclusion of instructed response items and survey completion speed indicators in the assessments and a habit of sensitivity checks in all secondary data studies.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12317","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"137552821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting Differential Item Functioning Using Posterior Predictive Model Checking: A Comparison of Discrepancy Statistics 用后验预测模型检验检测差异项目功能:差异统计的比较
IF 1.3 4区 心理学
Journal of Educational Measurement Pub Date : 2022-04-25 DOI: 10.1111/jedm.12316
Seang-Hwane Joo, Philseok Lee
{"title":"Detecting Differential Item Functioning Using Posterior Predictive Model Checking: A Comparison of Discrepancy Statistics","authors":"Seang-Hwane Joo,&nbsp;Philseok Lee","doi":"10.1111/jedm.12316","DOIUrl":"https://doi.org/10.1111/jedm.12316","url":null,"abstract":"<p>This study proposes a new Bayesian differential item functioning (DIF) detection method using posterior predictive model checking (PPMC). Item fit measures including infit, outfit, observed score distribution (OSD), and Q1 were considered as discrepancy statistics for the PPMC DIF methods. The performance of the PPMC DIF method was evaluated via a Monte Carlo simulation manipulating sample size, DIF size, DIF type, DIF percentage, and subpopulation trait distribution. Parametric DIF methods, such as Lord's chi-square and Raju's area approaches, were also included in the simulation design in order to compare the performance of the proposed PPMC DIF methods to those previously existing. Based on Type I error and power analysis, we found that PPMC DIF methods showed better-controlled Type I error rates than the existing methods and comparable power to detect uniform DIF. The implications and recommendations for applied researchers are discussed.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"137981441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Two IRT Characteristic Curve Linking Methods Weighted by Information 两种信息加权的IRT特征曲线连接方法
IF 1.3 4区 心理学
Journal of Educational Measurement Pub Date : 2022-04-17 DOI: 10.1111/jedm.12315
Shaojie Wang, Minqiang Zhang, Won-Chan Lee, Feifei Huang, Zonglong Li, Yixing Li, Sufang Yu
{"title":"Two IRT Characteristic Curve Linking Methods Weighted by Information","authors":"Shaojie Wang,&nbsp;Minqiang Zhang,&nbsp;Won-Chan Lee,&nbsp;Feifei Huang,&nbsp;Zonglong Li,&nbsp;Yixing Li,&nbsp;Sufang Yu","doi":"10.1111/jedm.12315","DOIUrl":"10.1111/jedm.12315","url":null,"abstract":"<p>Traditional IRT characteristic curve linking methods ignore parameter estimation errors, which may undermine the accuracy of estimated linking constants. Two new linking methods are proposed that take into account parameter estimation errors. The item- (IWCC) and test-information-weighted characteristic curve (TWCC) methods employ weighting components in the loss function from traditional methods by their corresponding item and test information, respectively. Monte Carlo simulation was conducted to evaluate the performances of the new linking methods and compare them with traditional ones. Ability difference between linking groups, sample size, and test length were manipulated under the common-item nonequivalent groups design. Results showed that the two information-weighted characteristic curve methods outperformed traditional methods, in general. TWCC was found to be more accurate and stable than IWCC. A pseudo-form pseudo-group analysis was also performed, and similar results were observed. Finally, guidelines for practice and future directions are discussed.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48483173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信