{"title":"An Application of the Partial Credit IRT Model in Identifying Benchmarks for Polytomous Rating Scale Instruments.","authors":"Enis Dogan","doi":"10.7275/1cf3-aq56","DOIUrl":"https://doi.org/10.7275/1cf3-aq56","url":null,"abstract":"Several large scale assessments include student, teacher, and school background questionnaires. Results from such questionnaires can be reported for each item separately, or as indices based on aggregation of multiple items into a scale. Interpreting scale scores is not always an easy task though. In disseminating results of achievement tests, one solution to this conundrum is to identify cut scores on the reporting scale in order to divide it into achievement levels that correspond to distinct knowledge and skill profiles. This allows for the reporting of the percentage of students at each achievement level in addition to average scale scores. Dividing a scale into meaningful segments can, and perhaps should, be done to enrich interpretability of scales based on questionnaire items as well. This article illustrates an approach based on an application of Item Response Theory (IRT) to accomplish this. The application is demonstrated with a polytomous rating scale instrument designed to measure students’ sense of school belonging.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82243844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From Simulation to Implementation: Two CAT Case Studies","authors":"John J. Barnard","doi":"10.7275/BWVG-D091","DOIUrl":"https://doi.org/10.7275/BWVG-D091","url":null,"abstract":"Measurement specialists strive to shorten assessment time without compromising precision of scores. Computerized Adaptive Testing (CAT) has rapidly gained ground over the past decades to fulfill this goal. However, parameters for implementation of CATs need to be explored in simulations before implementation so that it can be determined whether expectations can be met. CATs can become costly if trial-and-error strategies are followed and especially if constraints are included in the algorithms, simulations can save time and money. In this study it was found that for both a multiplechoice question test and a rating scale questionnaire, simulations not only predicted outcomes for CATs very well, but also illustrated the efficiency of CATs when compared to fixed length tests.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80229329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fairness Concerns of Discrete Option Multiple Choice Items.","authors":"Carol Eckerly, Russell J. Smith, J. Sowles","doi":"10.7275/JBRV-4E93","DOIUrl":"https://doi.org/10.7275/JBRV-4E93","url":null,"abstract":"The Discrete Option Multiple Choice (DOMC) item format was introduced by Foster and Miller (2009) with the intent of improving the security of test content. However, by changing the amount and order of the content presented, the test taking experience varies by test taker, thereby introducing potential fairness issues. In this paper we investigated fairness concerns by evaluating the impact on test takers of the differing testing experiences when items are administered in the DOMC format. Specifically, we described the impact of the presentation order of the key on item difficulty and discrimination as well as the cumulative impact at the test level. We recommend not including DOMC items in exams until the methodology of scoring test takers on these items is revised to address specific fairness concerns identified in this paper.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87440946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comparison of Subject Matter Experts’ Perceptions and Job Analysis Surveys","authors":"Adam E. Wyse, Ben Babcock","doi":"10.7275/7DEY-ZD62","DOIUrl":"https://doi.org/10.7275/7DEY-ZD62","url":null,"abstract":"Two common approaches for performing job analysis in credentialing programs are committee-based methods, which rely solely on subject matter experts’ judgments, and task inventory surveys. This study evaluates how well subject matter experts’ perceptions coincide with task inventory survey results for three credentialing programs. Results suggest that subject matter expert ratings differ in systematic ways from task inventory survey results and that task lists generated based solely on subject matter experts’ intuitions generally lead to narrower task lists. Results also indicated that there can be key differences for procedures and non-procedures, with subject matter experts’ judgments often tending to exhibit lower agreement levels with task inventory survey results for procedures than for non-procedures. We recommend that organizations performing job analyses think very carefully before relying solely on subject matter experts’ judgments as their primary method of job analysis.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81072107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lingjun He, R. Levine, J. Fan, Joshua Beemer, Jeanne Stronach
{"title":"Random Forest as a Predictive Analytics Alternative to Regression in Institutional Research.","authors":"Lingjun He, R. Levine, J. Fan, Joshua Beemer, Jeanne Stronach","doi":"10.7275/1WPR-M024","DOIUrl":"https://doi.org/10.7275/1WPR-M024","url":null,"abstract":"In institutional research, modern data mining approaches are seldom considered to address predictive analytics problems. The goal of this paper is to highlight the advantages of tree-based machine learning algorithms over classic (logistic) regression methods for data-informed decision making in higher education problems, and stress the success of random forest in circumstances where the regression assumptions are often violated in big data applications. Random forest is a model averaging procedure where each tree is constructed based on a bootstrap sample of the data set. In particular, we emphasize the ease of application, low computational cost, high predictive accuracy, flexibility, and interpretability of random forest machinery. Our overall recommendation is that institutional researchers look beyond classical regression and single decision tree analytics tools, and consider random forest as the predominant method for prediction tasks. The proposed points of view are detailed and illustrated through a simulation experiment and analyses of data from real institutional research projects.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78672506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Evaluation of Normal Versus Lognormal Distribution in Data Description and Empirical Analysis","authors":"R. Diwakar","doi":"10.7275/0EAT-HB38","DOIUrl":"https://doi.org/10.7275/0EAT-HB38","url":null,"abstract":"Many existing methods of statistical inference and analysis rely heavily on the assumption that the data are normally distributed. However, the normality assumption is not fulfilled when dealing with data which does not contain negative values or are otherwise skewed – a common occurrence in diverse disciplines such as finance, economics, political science, sociology, philology, biology and physical and industrial processes. In this situation, a lognormal distribution may better represent the data than the normal distribution. In this paper, I re-visit the key attributes of the normal and lognormal distributions, and demonstrate through an empirical analysis of the ‘number of political parties' in India, how logarithmic transformation can help in bringing a lognormally distributed data closer to a normal one. The paper also provides further empirical evidence to show that many variables of interest to political and other social scientists could be better modelled using the lognormal distribution. More generally, the paper emphasises the potential for improved description and empirical analysis of quantitative data by paying more attention to its distribution, and complements previous publications in Practical Research and Assessment Evaluation (PARE) on this subject.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80622306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Developing Rubrics to Assess Complex (Generic) Skills in the Classroom: How to Distinguish Skills’ Mastery Levels?","authors":"E. Rusman, K. Dirkx","doi":"10.7275/XFP0-8228","DOIUrl":"https://doi.org/10.7275/XFP0-8228","url":null,"abstract":"Many schools use analytic rubrics to (formatively) assess complex, generic or transversal (21st century) skills, such as collaborating and presenting. In rubrics, performance indicators on different levels of mastering a skill (e.g., novice, practiced, advanced, talented) are described. However, the dimensions used to describe the different mastery levels vary within and across rubrics and are in many cases not consistent, concise and often trivial, thereby hampering the quality of rubrics used to learn and assess complex skills. In this study we reviewed 600 rubrics available in three international databases (Rubistar, For All Rubrics, i-rubrics) and analyzed the dimensions found within 12 strictly selected rubrics that are currently used to distinguish mastery levels and describe performance indicators for the skill 'collaboration' at secondary schools. These dimensions were subsequently defined and categorized. This resulted in 13 different dimensions, clustered in 6 categories, feasible for defining skills’ mastery levels in rubrics. The identified dimensions can specifically support both teachers and researchers to construct, review and investigate performance indicators for each mastery level of a complex skill. On a more general level, they can support analysis of the overall quality of analytic rubrics to (formatively) assess complex skills.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82741140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Miscalculation of Interrater Reliability: A Case Study Involving the AAC&U VALUE Rubrics.","authors":"R. F. Szafran","doi":"10.7275/Y36W-HG55","DOIUrl":"https://doi.org/10.7275/Y36W-HG55","url":null,"abstract":"Institutional assessment of student learning objectives has become a fact-of-life in American higher education and the Association of American Colleges and Universities’ (AAC&U) VALUE Rubrics have become a widely adopted evaluation and scoring tool for student work. As faculty from a variety of disciplines, some less familiar with the psychometric literature, are drawn into assessment roles, it is important to point out two easily made but serious errors in what might appear to be one of the more straightforward assessments of measurement quality—interrater reliability. The first error which can occur when a third rater is brought in to adjudicate a discrepancy in the scores reported by an initial two raters has been well-documented in the literature but never before illustrated with AAC&U rubrics. The second error is to cease training before the raters have demonstrated a satisfactory level of interrater reliability. This research note describes an actual case study in which the interrater reliability of the AAC&U rubrics was incorrectly reported and when correctly reported found to be inadequate. The note concludes with recommendations for the correct measurement of interrater reliability.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79227302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advocating the Broad Use of the Decision Tree Method in Education","authors":"C. Gomes, L. Almeida","doi":"10.7275/2W3N-0F07","DOIUrl":"https://doi.org/10.7275/2W3N-0F07","url":null,"abstract":"Predictive studies have been widely undertaken in the field of education to provide strategic information about the extensive set of processes related to teaching and learning, as well as about what variables predict certain educational outcomes, such as academic achievement or dropout. As in any other area, there is a set of standard techniques that is usually used in predictive studies in the field education. Even though the Decision Tree Method is a well-known and standard approach in Data Mining and Machine Learning, and is broadly used in data science since the 1980's, this method is not part of the mainstream techniques used in predictive studies in the field of education. In this paper, we support a broad use of the Decision Tree Method in education. Instead of presenting formal algorithms or mathematical axioms to present the Decision Tree Method, we strictly present the method in practical terms, focusing on the rationale of the method, on how to interpret its results, and also, on the reasons why it should be broadly applied. We first show the modus operandi of the Decision Tree Method through a didactic example; afterwards, we apply the method in a classification task, in order to analyze specific educational data.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77680146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effective Use of Formative Assessment by High School Teachers.","authors":"Melanie Brink, D. Bartz","doi":"10.7275/ZH1K-ZK32","DOIUrl":"https://doi.org/10.7275/ZH1K-ZK32","url":null,"abstract":"The purpose of this mixed-methods study was to gain insights and understandings of high school teachers’ perceptions and use of formative assessment to enhance their planning, individualization of instruction, and adjustment of course content to improve student learning. The study was conducted over two years in a midwestern high school of approximately 1,000 students. Crucial to the three project teachers’ understanding of formative assessment was developing and using preset curriculum road maps that tightly aligned course goals, learning objectives, activities, instructional methods, and assessment. The in-depth case studies of the sample’s three teachers revealed that, when provided with specific information about formative assessment through staff development, they became more positive toward such assessment, and their implementation skills were greatly improved. The staff development had an especially positive impact on the teachers’ understanding and skill sets for individualizing instructional practices. The personalization of the staff development proved to be the most beneficial when it tailored the content to the varying levels of initial proficiency of the three sample teachers. Support for formative assessment by the administrative team members was essential to creating a cultural shift from summative to formative assessment.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83284391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}