{"title":"Comparing Performance of Feature Extraction Methods and Machine Learning Models in Automatic Essay Scoring","authors":"Lihua Yao, Hong Jiao","doi":"10.59863/dqiz8440","DOIUrl":"https://doi.org/10.59863/dqiz8440","url":null,"abstract":"This study used Kaggle data, the ASAP data set, and applied NLP and Bidirectional Encoder Representations from Transformers (BERT) for corpus processing and feature extraction, and applied different machine learning models, both traditional machine-learning classifiers and neural-network-based approaches. Supervised learning models were used for the scoring system, where six out of the eight essay prompts were trained separately and concatenated. Compared with previous study, we found that adding more features such as readability scores using Spacy Textsta improved the prediction results for the essay scoring system. The neural network model, trained on all prompt data and utilizing NLP for corpus processing and feature extraction, performed better than other models with an overall test quadratic weighted kappa (QWK) of 0.9724. It achieved the highest QWK score of 0.859 for prompt 1 and an average QWK of 0.771 across all 6 prompts, making it the best-performing machine learning model that was tested.","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74440964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"从 NEAP 阅读项目自动评分的数据挑战赛中汲取的公平性评估经验","authors":"Maggie Beiting-Parrish, John Whitmer","doi":"10.59863/nzbo8811","DOIUrl":"https://doi.org/10.59863/nzbo8811","url":null,"abstract":"自然语言处理(NLP)在各个领域被广泛用于预测学生开放式反应的人为评分 (Johnson et al., 2022)。保证基于学生人口统计学因素的算法公平是至关重要的 (Madnani et al., 2017)。本研究对数据挑战赛中表现最好的六个参赛者进行了公平性分析,涉及20个NEAP阅读理解项目,这些项目最初是基于种族和性别进行公平性分析的。本研究描述了包括英语语言学习者身份(ELLs)、个人教育计划以及免费/优惠午餐在内的附加公平性评估。许多项目在成绩预测上表现出较低的准确性,其中对ELLs表现得最为明显。本研究推荐在评分公平性评估中纳入额外的人口统计学因素,同样,公平性分析需要考虑多重因素和背景。","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135737278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lessons Learned about Evaluating Fairness from a Data Challenge to Automatically Score NAEP Reading Items","authors":"Maggie Beiting-Parrish, John Whitmer","doi":"10.59863/nkcj9608","DOIUrl":"https://doi.org/10.59863/nkcj9608","url":null,"abstract":"Natural language processing (NLP) is widely used to predict human scores for open-ended student assessment responses in various content areas (Johnson et al., 2022). Ensuring algorithmic fairness based on student demographic background factors is crucial (Madnani et al., 2017). This study presents a fairness analysis of six top-performing entries from a data challenge involving 20 NAEP reading comprehension items that were initially analyzed for fairness based on race/ethnicity and gender. This study describes additional fairness evaluation including English Language Learner Status (ELLs), Individual Education Plans, and Free/Reduced-Price Lunch. Several items showed lower accuracy for predicted scores, particularly for ELLs. This study recommends considering additional demographic factors in fairness scoring evaluations and that fairness analysis should consider multiple factors and contexts.","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135737279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Language of 21st Century Skills: Next Directions for Closing the Skills Gap Between Employers and Postsecondary Graduates","authors":"G. Orona, O. Liu, Richard Arum","doi":"10.59863/oivi3767","DOIUrl":"https://doi.org/10.59863/oivi3767","url":null,"abstract":"The onus of preparing skilled employees for the modern workforce is largely placed on institutions of higher education. However, recent surveys consistently show a skills gap between what employers’ desire and what graduates possess. This review engages this discussion in the context of measuring and assessing 21st century skills. We begin by succinctly reviewing literature pertaining to the skills gap, including what types of skills are commonly referenced, before moving to examine literature indicating the relations between current 21st century skills and job-related outcomes. Finally, we conclude with recommendations for higher education researchers examining skill development. Our recommendations cover three key corresponding areas: theories of cognitive development, intervention design, measurement and assessment.","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83066599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"21世纪技能的观点:缩小雇主和高等教育毕业生间技能差距的下一个方向","authors":"G. Orona, O. Liu, Richard Arum","doi":"10.59863/wzuf7282","DOIUrl":"https://doi.org/10.59863/wzuf7282","url":null,"abstract":"高等教育机构承担了为现代劳动力培养熟练员工的责任。然而,最近的调研一致显示雇 主期望与毕业生所拥有的技能差距。本综述在衡量和评估21世纪技能的语境中讨论这种差距。我们首先简要回顾有关技能差距的文献(包括哪些类型的技能最常被提及),然后 探讨当前 21 世纪技能与工作相关成果之间关系的献。最后,我们总结出给高等教育研究人员探索技能发展的建议。我们的建议涵盖三个关键的相关领域:认知发展理论、干预设计、测量和评估。","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88772158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Application of Theril Indexes for the Interrater Reliability: A Comparison with Intraclass Correlations","authors":"Tianshu Pan, Yue Yin","doi":"10.59863/wddk7257","DOIUrl":"https://doi.org/10.59863/wddk7257","url":null,"abstract":"This study proposes to apply the Theil-index ratios for the interrater reliability. We discuss the theoretical foundations and examine its function empirically using real data. Our analyses show that Theil-index rations and intraclass correlation (ICC) estimates are highly correlated. However, ICC may underestimate the interrater reliability by some extreme disagreement among raters and be more likely to be influenced by the extreme disagreement. As Theil-index ratios overcome the limitations of ICC to some degree, it seems that Theil-index ratios provide an alternative to evaluating interrater reliability, at least under certain conditions, e.g., when outliers exist in the data, it is difficult to obtain the variance component estimates, or ICC underestimates interrater reliability.","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"130 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74896897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"flexCDMs: A Web-based Platform for Cognitive Diagnostic Data Analysis","authors":"Dongbo Tu, Yong Liu, Xuliang Gao, Yan Cai","doi":"10.59863/osdb8732","DOIUrl":"https://doi.org/10.59863/osdb8732","url":null,"abstract":"Cognitive diagnosis is an important component of modern measurement theory and has received widespread attention from researchers in the fields of education and psychological measurement. Existing cognitive diagnosis analysis tools rely on professional software packages (such as R packages), which creates significant challenges for users, especially those who are not familiar with computer programming. To remove this technical barrier, our team has developed a web-based, user-friendly platform, named flexCDMs, for cognitive diagnosis data analysis. This article describes the features of the platform, the functional modules, the implemented cognitive diagnosis models (CDMs) and algorithms, and illustrates the operations of the platform. This platform can be used to analyze data based on various cognitive diagnosis models, carry out Q-matrix theory, model-data fit tests, parameter estimation, quality analysis of cognitive diagnostic tests, differential item functioning (DIF) detection, and Q-matrix modification. It produces various charts and graphs to report results. It is a powerful, yet easy to use cognitive diagnosis data analysis tool. The website for the flexCDMs platform is: http://111.230.233.68:1001/?Id=false&Block","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79149589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}