Sandra M. Sweeney, Sandip Sinharay, Matthew S. Johnson, Eric W. Steinhauer
{"title":"An Investigation of the Nature and Consequence of the Relationship between IRT Difficulty and Discrimination","authors":"Sandra M. Sweeney, Sandip Sinharay, Matthew S. Johnson, Eric W. Steinhauer","doi":"10.1111/emip.12522","DOIUrl":"10.1111/emip.12522","url":null,"abstract":"<p>The focus of this paper is on the empirical relationship between item difficulty and item discrimination. Two studies—an empirical investigation and a simulation study—were conducted to examine the association between item difficulty and item discrimination under classical test theory and item response theory (IRT), and the effects of the association on various quantities of interest. Results from the empirical investigation show that item difficulty and item discrimination are negatively correlated under classical test theory, mostly negatively correlated under the two-parameter logistic model, and mostly positively correlated under the three-parameter logistic model; the magnitude of the correlation varied over the different data sets. Results from the simulation study reveal that a failure to incorporate the correlation between item difficulty and item discrimination in IRT simulations may provide the investigator with inaccurate values of important quantities of interest, and may lead to incorrect operational decisions. Implications to practice and future directions are discussed.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"41 4","pages":"50-67"},"PeriodicalIF":2.0,"publicationDate":"2022-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45419783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Digital Module 29: Multidimensional Item Response Theory Equating","authors":"Stella Y. Kim","doi":"10.1111/emip.12525","DOIUrl":"10.1111/emip.12525","url":null,"abstract":"<p>In this digital ITEMS module, Dr. Stella Kim provides an overview of multidimensional item response theory (MIRT) equating. Traditional unidimensional item response theory (IRT) equating methods impose the sometimes untenable restriction on data that only a single ability is assessed. This module discusses potential sources of multidimensionality and presents potential consequences of multidimensionality on equating. To remedy these effects, MIRT equating can be used as a viable alternative to traditional methods of IRT equating. In conducting MIRT equating, the choice of an appropriate MIRT model is necessary, and thus the module describes several existing MIRT models and illustrates each using hypothetical examples. After a brief description of MIRT models, an extensive review of the current literature is presented to identify gaps in the literature on MIRT equating. Then, the steps for conducting MIRT observed-score equating are described. Finally, the module discusses practical considerations in applying MIRT equating to testing practices and suggests potential areas of research for future studies.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"41 3","pages":"85-86"},"PeriodicalIF":2.0,"publicationDate":"2022-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12525","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44828478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Cover: Person Infit Density Contour","authors":"Yuan-Ling Liaw","doi":"10.1111/emip.12526","DOIUrl":"10.1111/emip.12526","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"41 3","pages":"4"},"PeriodicalIF":2.0,"publicationDate":"2022-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43771777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Special Case of Brennan's Index for Tests That Aim to Select a Limited Number of Students: A Monte Carlo Simulation Study","authors":"Serkan Arikan, Eren Can Aybek","doi":"10.1111/emip.12528","DOIUrl":"10.1111/emip.12528","url":null,"abstract":"<p>Many scholars compared various item discrimination indices in real or simulated data. Item discrimination indices, such as item-total correlation, item-rest correlation, and IRT item discrimination parameter, provide information about individual differences among all participants. However, there are tests that aim to select a very limited number of students, examinees, or candidates for allocated schools and job positions. Thus, there is a need to evaluate the performances of CTT and IRT item discrimination indices when the test purpose is to select a limited number of students. The purpose of the current Monte Carlo study is to evaluate item discrimination indices in the case of selecting a limited number of high-achieving students. The results showed that a special case of Brennan's index, <i>B</i><sub>10–90</sub>, provided more accurate information for this specific test purpose. Additionally, the effects of various factors, such as test length, ability distributions of examinees, and item difficulty variance on item discrimination indices were investigated. The performance of each item discrimination index is discussed in detail.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"41 4","pages":"35-49"},"PeriodicalIF":2.0,"publicationDate":"2022-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43556492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Supporting the Interpretive Validity of Student-Level Claims in Science Assessment with Tiered Claim Structures","authors":"Sanford R. Student, Brian Gong","doi":"10.1111/emip.12523","DOIUrl":"10.1111/emip.12523","url":null,"abstract":"<p>We address two persistent challenges in large-scale assessments of the Next Generation Science Standards: (a) the validity of score interpretations that target the standards broadly and (b) how to structure claims for assessments of this complex domain. The NGSS pose a particular challenge for specifying claims about students that evidence from summative assessments can support. As a solution, we propose tiered claims, which explicitly distinguish between claims about what students have done or can do on test items—which are typically easier to support under current test designs—and claims about what students could do in the broader domain of performances described by the standards, for which novel evidence is likely required. We discuss the positive implications of tiered claims for test construction, validation, and reporting of results.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"41 4","pages":"68-78"},"PeriodicalIF":2.0,"publicationDate":"2022-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49048947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Average Rank and Adjusted Rank Are Better Measures of College Student Success than GPA","authors":"Donald Wittman","doi":"10.1111/emip.12521","DOIUrl":"10.1111/emip.12521","url":null,"abstract":"<p>I show that there are better measures of student college performance than grade point average (GPA) by undertaking a fine-grained empirical investigation of grading within a large public university. The value of using GPA as a measure of comparative performance is undermined by academically weaker students taking courses where the grading is more generous. In fact, college courses composed of <i>weaker</i> performing students (whether measured by their relative performance in other classes, SAT scores, or high school GPA) have <i>higher</i> average grades. To partially correct for idiosyncratic grading across classes, alternative measures, student class rank and the student's average class rank, are introduced. In comparison to a student's lower-division grade, the student's lower-division <i>rank</i> is a better predictor of the student's grade in the upper-division course. Course rank and course grade are adjusted to account for different levels of academic competitiveness across courses (more precisely, student fixed-effects are derived). SAT scores and high school GPA are then used to predict college performance. Higher explained variation (<i>R</i><sup>2</sup>) is obtained when the dependent variable is average class rank rather than GPA. Still higher explained variation occurs when the dependent variable is adjusted rank.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"41 4","pages":"23-34"},"PeriodicalIF":2.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12521","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45018605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reconceptualization of Coefficient Alpha Reliability for Test Summed and Scaled Scores","authors":"Rashid S. Almehrizi","doi":"10.1111/emip.12520","DOIUrl":"10.1111/emip.12520","url":null,"abstract":"<p>Coefficient alpha reliability persists as the most common reliability coefficient reported in research. The assumptions for its use are, however, not well-understood. The current paper challenges the commonly used expressions of coefficient alpha and argues that while these expressions are correct when estimating reliability for summed scores, they are not appropriate to extend coefficient alpha to correctly estimate the reliability for nonlinearly transformed scaled scores such as percentile ranks and stanines. The current paper reconceptualizes coefficient alpha as a complement of the ratio of two unbiased estimates of the summed score variance. These include conditional summed score variance assuming uncorrelated item scores (gives the error score variance) and unconditional summed score variance incorporating intercorrelated item scores (gives the observed score variance). Using this reconceptualization, a new equation of coefficient generalized alpha is introduced for scaled scores. Coefficient alpha is a special case of this new equation since the latter reduces to coefficinet alpha if the scaled scores are the summed scores themselves. Two applications (cognitive and psychological assessments) are used to compare the performance (estimation and bootstrap confidence interval) of the reliability coefficients for different scaled scores. Results support the new equation of coefficient generalized alpha and compare it to coefficient generalized beta for parallel test forms. Coefficient generalized alpha produced different reliability values, which were larger than coefficient generalized beta for different scaled scores.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"41 3","pages":"38-47"},"PeriodicalIF":2.0,"publicationDate":"2022-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45740678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Communicating Measurement Outcomes with (Better) Graphics","authors":"J. Carl Setzer, Zhongmin Cui","doi":"10.1111/emip.12519","DOIUrl":"10.1111/emip.12519","url":null,"abstract":"<p>Data visualization is a core tenet of communicating measurement research and outcomes. Measurement professionals utilize data visualization in various phases of research, including exploration and communication. However, data visualization has not received enough attention in the measurement field. While it is true that many measurement graphics are relatively standard, many others are not and there is a wide variety of visualization quality and effectiveness seen in measurement journals. This article provides an overview of the current data visualization trends in measurement and provides some general tips for effective data visualization, with examples. This article is not a comprehensive treatise on data visualization. Therefore, we provide some resources for additional reading. Finally, we call on the measurement community to pay greater attention to the details of data visualization. We also call on measurement training programs to emphasize statistical reasoning through data visualization.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"41 3","pages":"5-13"},"PeriodicalIF":2.0,"publicationDate":"2022-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43990870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Cover: Indicators for Item Preknowledge","authors":"Yuan-Ling Liaw","doi":"10.1111/emip.12507","DOIUrl":"10.1111/emip.12507","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"41 2","pages":"6"},"PeriodicalIF":2.0,"publicationDate":"2022-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12507","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46252186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}