Chinese/English journal of educational measurement and evaluation最新文献

筛选
英文 中文
Comparing Performance of Feature Extraction Methods and Machine Learning Models in Automatic Essay Scoring 特征提取方法与机器学习模型在自动作文评分中的性能比较
Chinese/English journal of educational measurement and evaluation Pub Date : 2023-09-01 DOI: 10.59863/dqiz8440
Lihua Yao, Hong Jiao
{"title":"Comparing Performance of Feature Extraction Methods and Machine Learning Models in Automatic Essay Scoring","authors":"Lihua Yao, Hong Jiao","doi":"10.59863/dqiz8440","DOIUrl":"https://doi.org/10.59863/dqiz8440","url":null,"abstract":"This study used Kaggle data, the ASAP data set, and applied NLP and Bidirectional Encoder Representations from Transformers (BERT) for corpus processing and feature extraction, and applied different machine learning models, both traditional machine-learning classifiers and neural-network-based approaches. Supervised learning models were used for the scoring system, where six out of the eight essay prompts were trained separately and concatenated. Compared with previous study, we found that adding more features such as readability scores using Spacy Textsta improved the prediction results for the essay scoring system. The neural network model, trained on all prompt data and utilizing NLP for corpus processing and feature extraction, performed better than other models with an overall test quadratic weighted kappa (QWK) of 0.9724. It achieved the highest QWK score of 0.859 for prompt 1 and an average QWK of 0.771 across all 6 prompts, making it the best-performing machine learning model that was tested.","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74440964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
比较特征提取方法和机器学习模型在作文自动评分中的表现 比较特征提取方法和机器学习模型在作文自动评分中的表现
Chinese/English journal of educational measurement and evaluation Pub Date : 2023-09-01 DOI: 10.59863/vlgu9815
Li Yao, Hongzan Jiao
{"title":"比较特征提取方法和机器学习模型在作文自动评分中的表现","authors":"Li Yao, Hongzan Jiao","doi":"10.59863/vlgu9815","DOIUrl":"https://doi.org/10.59863/vlgu9815","url":null,"abstract":"本研究利用特征提取与机器学习方法分析 Kaggle 数据,即 ASAP 数据集。具体而言,应用自然语言处理(Natural Language Processing, NLP)和双向编码表示转换模型 (Bidirectional Encoder Representations from Transformers, BERT)进行语料处理和特征提取,并涵盖不同的机器学习模型,包括传统的机器学习分类器和基于神经网络的方法。 对评分系统使用有监督学习模型,对其中 6/8 的写作指令(prompt)进行单独训练或同 时训练。与已有研究相比,本研究发现:(1)增加特征的数量(如使用 Spacy Textsta 的 易读性得分)能够提高作文评分系统的预测能力;(2)使用 NLP 进行语料处理和特征提 取的神经网络模型,同时训练所有写作指令时表现优于其他模型,整体二次加权 Kappa 系数(QWK)为 0.9724。其中,写作指令 1 的 QWK 最高,具体为 0.859,所有 6 个写 作指令的平均 QWK 为 0.771。","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88170327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
从 NEAP 阅读项目自动评分的数据挑战赛中汲取的公平性评估经验 从 NEAP 阅读项目自动评分的数据挑战赛中汲取的公平性评估经验
Chinese/English journal of educational measurement and evaluation Pub Date : 2023-09-01 DOI: 10.59863/nzbo8811
Maggie Beiting-Parrish, John Whitmer
{"title":"从 NEAP 阅读项目自动评分的数据挑战赛中汲取的公平性评估经验","authors":"Maggie Beiting-Parrish, John Whitmer","doi":"10.59863/nzbo8811","DOIUrl":"https://doi.org/10.59863/nzbo8811","url":null,"abstract":"自然语言处理(NLP)在各个领域被广泛用于预测学生开放式反应的人为评分 (Johnson et al., 2022)。保证基于学生人口统计学因素的算法公平是至关重要的 (Madnani et al., 2017)。本研究对数据挑战赛中表现最好的六个参赛者进行了公平性分析,涉及20个NEAP阅读理解项目,这些项目最初是基于种族和性别进行公平性分析的。本研究描述了包括英语语言学习者身份(ELLs)、个人教育计划以及免费/优惠午餐在内的附加公平性评估。许多项目在成绩预测上表现出较低的准确性,其中对ELLs表现得最为明显。本研究推荐在评分公平性评估中纳入额外的人口统计学因素,同样,公平性分析需要考虑多重因素和背景。","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135737278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lessons Learned about Evaluating Fairness from a Data Challenge to Automatically Score NAEP Reading Items 从数据挑战中评估公平性以自动评分NAEP阅读项目的经验教训
Chinese/English journal of educational measurement and evaluation Pub Date : 2023-09-01 DOI: 10.59863/nkcj9608
Maggie Beiting-Parrish, John Whitmer
{"title":"Lessons Learned about Evaluating Fairness from a Data Challenge to Automatically Score NAEP Reading Items","authors":"Maggie Beiting-Parrish, John Whitmer","doi":"10.59863/nkcj9608","DOIUrl":"https://doi.org/10.59863/nkcj9608","url":null,"abstract":"Natural language processing (NLP) is widely used to predict human scores for open-ended student assessment responses in various content areas (Johnson et al., 2022). Ensuring algorithmic fairness based on student demographic background factors is crucial (Madnani et al., 2017). This study presents a fairness analysis of six top-performing entries from a data challenge involving 20 NAEP reading comprehension items that were initially analyzed for fairness based on race/ethnicity and gender. This study describes additional fairness evaluation including English Language Learner Status (ELLs), Individual Education Plans, and Free/Reduced-Price Lunch. Several items showed lower accuracy for predicted scores, particularly for ELLs. This study recommends considering additional demographic factors in fairness scoring evaluations and that fairness analysis should consider multiple factors and contexts.","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135737279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
练习测试中的粗心案例检测 练习测试中的粗心案例检测
Chinese/English journal of educational measurement and evaluation Pub Date : 2023-09-01 DOI: 10.59863/ahsa2170
Steven Nydick
{"title":"练习测试中的粗心案例检测","authors":"Steven Nydick","doi":"10.59863/ahsa2170","DOIUrl":"https://doi.org/10.59863/ahsa2170","url":null,"abstract":"本文提出了一种新颖的方法,利用机器学习模型在低风险的练习测试中检测粗心的作答 行为。我们不是根据模型的拟合统计量或已知的事实将被试的作答归类为粗心,而是构 建了一个模型,该模型基于练习测试题目的属性来预测练习测试与正式测试之间的考试 分数的显著变化。我们利用有关粗心被试如何作答题目的假设,从练习测试题目中提取 特征,通过交叉验证来优化模型的样本外预测,并在预测最接近的正式测试时减少异方 差性。所有分析均使用 Duolingo 英语测试的练习版和正式版的数据。我们讨论了使用机 器学习模型预测粗心作答情况与其他的流行方法相比的意义。","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139345427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
泰尔指数在评分者间信度中的应用:与组内相关系数的比较 泰尔指数在评分者间信度中的应用:与组内相关系数的比较
Chinese/English journal of educational measurement and evaluation Pub Date : 2023-08-01 DOI: 10.59863/bner9428
天舒 潘, 悦 阴
{"title":"泰尔指数在评分者间信度中的应用:与组内相关系数的比较","authors":"天舒 潘, 悦 阴","doi":"10.59863/bner9428","DOIUrl":"https://doi.org/10.59863/bner9428","url":null,"abstract":"本文建议应用泰尔(Theil)指数比率于评分者间信度。我们讨论了其理论基础,并使用 真实数据进行了检验。研究结果表明,组内相关系数和泰尔指数比率结果之间的相关性 很高。但是,组内相关系数的估计可能会因评分者之间的某些极端分歧而低估评分者间 信度,比泰尔指数比率更容易受到这些极端分歧的影响。鉴于泰尔指数比率在某种程度 上克服了组内相关系数的局限性,至少在某些条件下,例如,当数据中存在奇异值,很 难估计方差分量,或者组内相关系数低估了评分者间信度的时候,泰尔指数比率提供了 评估评分者间信度的另一种方法。","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73394756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Language of 21st Century Skills: Next Directions for Closing the Skills Gap Between Employers and Postsecondary Graduates 21世纪技能的语言:缩小雇主与高等教育毕业生之间技能差距的下一个方向
Chinese/English journal of educational measurement and evaluation Pub Date : 2023-08-01 DOI: 10.59863/oivi3767
G. Orona, O. Liu, Richard Arum
{"title":"The Language of 21st Century Skills: Next Directions for Closing the Skills Gap Between Employers and Postsecondary Graduates","authors":"G. Orona, O. Liu, Richard Arum","doi":"10.59863/oivi3767","DOIUrl":"https://doi.org/10.59863/oivi3767","url":null,"abstract":"The onus of preparing skilled employees for the modern workforce is largely placed on institutions of higher education. However, recent surveys consistently show a skills gap between what employers’ desire and what graduates possess. This review engages this discussion in the context of measuring and assessing 21st century skills. We begin by succinctly reviewing literature pertaining to the skills gap, including what types of skills are commonly referenced, before moving to examine literature indicating the relations between current 21st century skills and job-related outcomes. Finally, we conclude with recommendations for higher education researchers examining skill development. Our recommendations cover three key corresponding areas: theories of cognitive development, intervention design, measurement and assessment.","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83066599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
21世纪技能的观点:缩小雇主和高等教育毕业生间技能差距的下一个方向 21世纪技能的观点:缩小雇主和高等教育毕业生间技能差距的下一个方向
Chinese/English journal of educational measurement and evaluation Pub Date : 2023-08-01 DOI: 10.59863/wzuf7282
G. Orona, O. Liu, Richard Arum
{"title":"21世纪技能的观点:缩小雇主和高等教育毕业生间技能差距的下一个方向","authors":"G. Orona, O. Liu, Richard Arum","doi":"10.59863/wzuf7282","DOIUrl":"https://doi.org/10.59863/wzuf7282","url":null,"abstract":"高等教育机构承担了为现代劳动力培养熟练员工的责任。然而,最近的调研一致显示雇 主期望与毕业生所拥有的技能差距。本综述在衡量和评估21世纪技能的语境中讨论这种差距。我们首先简要回顾有关技能差距的文献(包括哪些类型的技能最常被提及),然后 探讨当前 21 世纪技能与工作相关成果之间关系的献。最后,我们总结出给高等教育研究人员探索技能发展的建议。我们的建议涵盖三个关键的相关领域:认知发展理论、干预设计、测量和评估。","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88772158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Application of Theril Indexes for the Interrater Reliability: A Comparison with Intraclass Correlations 组间信度指标的应用:与组内相关的比较
Chinese/English journal of educational measurement and evaluation Pub Date : 2023-08-01 DOI: 10.59863/wddk7257
Tianshu Pan, Yue Yin
{"title":"An Application of Theril Indexes for the Interrater Reliability: A Comparison with Intraclass Correlations","authors":"Tianshu Pan, Yue Yin","doi":"10.59863/wddk7257","DOIUrl":"https://doi.org/10.59863/wddk7257","url":null,"abstract":"This study proposes to apply the Theil-index ratios for the interrater reliability. We discuss the theoretical foundations and examine its function empirically using real data. Our analyses show that Theil-index rations and intraclass correlation (ICC) estimates are highly correlated. However, ICC may underestimate the interrater reliability by some extreme disagreement among raters and be more likely to be influenced by the extreme disagreement. As Theil-index ratios overcome the limitations of ICC to some degree, it seems that Theil-index ratios provide an alternative to evaluating interrater reliability, at least under certain conditions, e.g., when outliers exist in the data, it is difficult to obtain the variance component estimates, or ICC underestimates interrater reliability.","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"130 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74896897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
flexCDMs: A Web-based Platform for Cognitive Diagnostic Data Analysis flexCDMs:基于web的认知诊断数据分析平台
Chinese/English journal of educational measurement and evaluation Pub Date : 2023-06-28 DOI: 10.59863/osdb8732
Dongbo Tu, Yong Liu, Xuliang Gao, Yan Cai
{"title":"flexCDMs: A Web-based Platform for Cognitive Diagnostic Data Analysis","authors":"Dongbo Tu, Yong Liu, Xuliang Gao, Yan Cai","doi":"10.59863/osdb8732","DOIUrl":"https://doi.org/10.59863/osdb8732","url":null,"abstract":"Cognitive diagnosis is an important component of modern measurement theory and has received widespread attention from researchers in the fields of education and psychological measurement. Existing cognitive diagnosis analysis tools rely on professional software packages (such as R packages), which creates significant challenges for users, especially those who are not familiar with computer programming. To remove this technical barrier, our team has developed a web-based, user-friendly platform, named flexCDMs, for cognitive diagnosis data analysis. This article describes the features of the platform, the functional modules, the implemented cognitive diagnosis models (CDMs) and algorithms, and illustrates the operations of the platform. This platform can be used to analyze data based on various cognitive diagnosis models, carry out Q-matrix theory, model-data fit tests, parameter estimation, quality analysis of cognitive diagnostic tests, differential item functioning (DIF) detection, and Q-matrix modification. It produces various charts and graphs to report results. It is a powerful, yet easy to use cognitive diagnosis data analysis tool. The website for the flexCDMs platform is: http://111.230.233.68:1001/?Id=false&Block","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79149589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信