{"title":"Bilevel Topic Model-Based Multitask Learning for Constructed-Responses Multidimensional Automated Scoring and Interpretation","authors":"Jiawei Xiong, Feiming Li","doi":"10.1111/emip.12550","DOIUrl":"10.1111/emip.12550","url":null,"abstract":"<p>Multidimensional scoring evaluates each constructed-response answer from more than one rating dimension and/or trait such as lexicon, organization, and supporting ideas instead of only one holistic score, to help students distinguish between various dimensions of writing quality. In this work, we present a bilevel learning model for combining two objectives, the multidimensional automated scoring, and the students’ writing structure analysis and interpretation. The dual objectives are enabled by a supervised model, called Latent Dirichlet Allocation Multitask Learning (LDAMTL), integrating a topic model and a multitask learning model with an attention mechanism. Two empirical data sets were employed to indicate LDAMTL model performance. On one hand, results suggested that LDAMTL owns better scoring and QW-<i>κ</i> values than two other competitor models, the supervised latent Dirichlet allocation, and Bidirectional Encoder Representations from Transformers at the 5% significance level. On the other hand, extracted topic structures revealed that students with a higher language score tended to employ more compelling words to support the argument in their answers. This study suggested that LDAMTL not only demonstrates the model performance by conjugating the underlying shared representation of each topic and learned representation from the neural networks but also helps understand students’ writing.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 2","pages":"42-61"},"PeriodicalIF":2.0,"publicationDate":"2023-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47037574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Machine Learning Approach for the Simultaneous Detection of Preknowledge in Examinees and Items When Both Are Unknown","authors":"Yiqin Pan, James A. Wollack","doi":"10.1111/emip.12543","DOIUrl":"10.1111/emip.12543","url":null,"abstract":"<p>Pan and Wollack (PW) proposed a machine learning method to detect compromised items. We extend the work of PW to an approach detecting compromised items and examinees with item preknowledge simultaneously and draw on ideas in ensemble learning to relax several limitations in the work of PW. The suggested approach also provides a confidence score, which is based on an autoencoder to represent our confidence that the detection result truly corresponds to item preknowledge. Simulation studies indicate that the proposed approach performs well in the detection of item preknowledge, and the confidence score can provide helpful information for users.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 1","pages":"76-98"},"PeriodicalIF":2.0,"publicationDate":"2023-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42758240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cheating Detection of Test Collusion: A Study on Machine Learning Techniques and Feature Representation","authors":"Shun-Chuan Chang, Keng Lun Chang","doi":"10.1111/emip.12538","DOIUrl":"10.1111/emip.12538","url":null,"abstract":"<p>Machine learning has evolved and expanded as an interdisciplinary research method for educational sciences. However, cheating detection of test collusion among multiple examinees or sets of examinees with unusual answer patterns using machine learning techniques has remained relatively unexplored. This study investigates collusion on multiple-choice tests by introducing feature representation methodologies and machine learning algorithms that can be jointly used as a promising method; they can be used not only to detect individual examinees involved in the collusion but also to evaluate test collusion with or without the groups of potentially dishonest examinees identified a priori. Furthermore, using small-sample examples, the visual detection procedures of the current study were articulated to help identify questionable item response groups and simultaneously focus on the specific individuals providing anomalous answers.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 2","pages":"62-73"},"PeriodicalIF":2.0,"publicationDate":"2023-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45173876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"To Score or Not to Score: Factors Influencing Performance and Feasibility of Automatic Content Scoring of Text Responses","authors":"Torsten Zesch, Andrea Horbach, Fabian Zehner","doi":"10.1111/emip.12544","DOIUrl":"10.1111/emip.12544","url":null,"abstract":"<p>In this article, we systematize the factors influencing performance and feasibility of automatic content scoring methods for short text responses. We argue that performance (i.e., how well an automatic system agrees with human judgments) mainly depends on the linguistic <i>variance</i> seen in the responses and that this variance is indirectly influenced by other factors such as target population or input modality. Extending previous work, we distinguish <i>conceptual</i>, <i>realization</i>, and <i>nonconformity variance</i>, which are differentially impacted by the various factors. While conceptual variance relates to different concepts embedded in the text responses, realization variance refers to their diverse manifestation through natural language. Nonconformity variance is added by aberrant response behavior. Furthermore, besides its performance, the feasibility of using an automatic scoring system depends on external factors, such as ethical or computational constraints, which influence whether a system with a given performance is accepted by stakeholders. Our work provides (i) a framework for assessment practitioners to decide a priori whether automatic content scoring can be successfully applied in a given setup as well as (ii) new empirical findings and the integration of empirical findings from the literature on factors that influence automatic systems' performance.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 1","pages":"44-58"},"PeriodicalIF":2.0,"publicationDate":"2023-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12544","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44028287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Call for Papers: Leveraging Measurement for Better Decisions","authors":"","doi":"10.1111/emip.12546","DOIUrl":"10.1111/emip.12546","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 1","pages":"7"},"PeriodicalIF":2.0,"publicationDate":"2023-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48049149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introduction to the Special Section “Issues and Practice in Applying Machine Learning in Educational Measurement”","authors":"Zhongmin Cui","doi":"10.1111/emip.12547","DOIUrl":"10.1111/emip.12547","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 1","pages":"8"},"PeriodicalIF":2.0,"publicationDate":"2023-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12547","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42293990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine Learning Literacy for Measurement Professionals: A Practical Tutorial","authors":"Rui Nie, Qi Guo, Maxim Morin","doi":"10.1111/emip.12539","DOIUrl":"10.1111/emip.12539","url":null,"abstract":"<p>The COVID-19 pandemic has accelerated the digitalization of assessment, creating new challenges for measurement professionals, including big data management, test security, and analyzing new validity evidence. In response to these challenges, <i>Machine Learning</i> (ML) emerges as an increasingly important skill in the toolbox of measurement professionals in this new era. However, most ML tutorials are technical and conceptual-focused. Therefore, this tutorial aims to provide a practical introduction to ML in the context of educational measurement. We also supplement our tutorial with several examples of supervised and unsupervised ML techniques applied to marking a short-answer question. Python codes are available on GitHub. In the end, common misconceptions about ML are discussed.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 1","pages":"9-23"},"PeriodicalIF":2.0,"publicationDate":"2023-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48299064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Causal Inference and COVID: Contrasting Methods for Evaluating Pandemic Impacts Using State Assessments","authors":"Benjamin R. Shear","doi":"10.1111/emip.12540","DOIUrl":"10.1111/emip.12540","url":null,"abstract":"<p>In the spring of 2021, just 1 year after schools were forced to close for COVID-19, state assessments were administered at great expense to provide data about impacts of the pandemic on student learning and to help target resources where they were most needed. Using state assessment data from Colorado, this article describes the biggest threats to making valid inferences about student learning to study pandemic impacts using state assessment data: measurement artifacts affecting the comparability of scores, secular trends, and changes in the tested population. The article compares three statistical approaches (the Fair Trend, baseline student growth percentiles, and multiple regression with demographic covariates) that can support more valid inferences about student learning during the pandemic and in other scenarios in which the tested population changes over time. All three approaches lead to similar inferences about statewide student performance but can lead to very different inferences about student subgroups. Results show that controlling statistically for prepandemic demographic differences can reverse the conclusions about groups most affected by the pandemic and decisions about prioritizing resources.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 1","pages":"99-109"},"PeriodicalIF":2.0,"publicationDate":"2023-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44143301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine Learning–Based Profiling in Test Cheating Detection","authors":"Huijuan Meng, Ye Ma","doi":"10.1111/emip.12541","DOIUrl":"10.1111/emip.12541","url":null,"abstract":"<p>In recent years, machine learning (ML) techniques have received more attention in detecting aberrant test-taking behaviors due to advantages when compared to traditional data forensics methods. However, defining “True Test Cheaters” is challenging—different than other fraud detection tasks such as flagging forged bank checks or credit card frauds, testing organizations are often lack of physical evidences to identify “True Test Cheaters” to train ML models. This study proposed a statistically defensible method of labeling “True Test Cheaters” in the data, demonstrated the effectiveness of using ML approaches to identify irregular statistical patterns in exam data, and established an analytical framework for evaluating and conducting real-time ML-based test data forensics. Classification accuracy and false negative/positive results are evaluated across different supervised-ML techniques. The reliability and feasibility of operationally using this approach for an IT certification exam are evaluated using real data.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 1","pages":"59-75"},"PeriodicalIF":2.0,"publicationDate":"2023-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48429707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Psychometric Evaluation of the Preschool Early Numeracy Skills Test–Brief Version Within the Item Response Theory Framework","authors":"Nikolaos Tsigilis, Katerina Krousorati, Athanasios Gregoriadis, Vasilis Grammatikopoulos","doi":"10.1111/emip.12536","DOIUrl":"10.1111/emip.12536","url":null,"abstract":"<p>The Preschool Early Numeracy Skills Test–Brief Version (PENS-B) is a measure of early numeracy skills, developed and mainly used in the United States. The purpose of this study was to examine the factorial validity and measurement invariance across gender of PENS-B in the Greek educational context. PENS-B was administered to 906 preschool children (473 boys, 433 girls), randomly selected from 84 kindergarten classrooms. A 2PL unidimensional and multidimensional item response theory analysis, using cross-validation procedures, were used to analyze the data. Results showed that responses to 20 items can be adequately explained by a two-dimensional model (Numbering Relations and Arithmetic Operations). Application of differential item functioning procedures did not detect any gender bias. Numeracy Relation comprises 16 items, which assess low levels of this latent trait. On the other hand, four items capture average levels of Arithmetic Operations. Total information curves revealed that both dimensions measure with precision only a small area of their underlying latent trait.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 2","pages":"32-41"},"PeriodicalIF":2.0,"publicationDate":"2023-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12536","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47069473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}