{"title":"Getting Lucky: How Guessing Threatens the Validity of Performance Classifications","authors":"B. P. Foley","doi":"10.7275/1G6P-4Y79","DOIUrl":"https://doi.org/10.7275/1G6P-4Y79","url":null,"abstract":"There is always a chance that examinees will answer multiple choice (MC) items correctly by guessing. Design choices in some modern exams have created situations where guessing at random through the full exam—rather than only for a subset of items where the examinee does not know the answer— can be an effective strategy to pass the exam. This paper describes two case studies to illustrate this problem, discusses test development decisions that can help address the situation, and provides recommendations to testing professionals to help identify when guessing at random can be an effective strategy to pass the exam.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87713731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tutorial on Using Regression Models with Count Outcomes Using R.","authors":"A Alexander Beaujean, G. Morgan","doi":"10.7275/PJ8C-H254","DOIUrl":"https://doi.org/10.7275/PJ8C-H254","url":null,"abstract":"Education researchers often study count variables, such as times a student reached a goal, discipline referrals, and absences. Most researchers that study these variables use typical regression methods (i.e., ordinary least-squares) either with or without transforming the count variables. In either case, using typical regression for count data can produce parameter estimates that are biased, thus diminishing any inferences made from such data. As count-variable regression models are seldom taught in training programs, we present a tutorial to help educational researchers use such methods in their own research. We demonstrate analyzing and interpreting count data using Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial regression models. The count regression methods are introduced through an example using the number of times students skipped class. The data for this example are freely available and the R syntax used run the example analyses are included in the Appendix. Count variables such as number of times a student reached a goal, discipline referrals, and absences are ubiquitous in school settings. After a review of published single-case design studies Shadish and Sullivan (2011) recently concluded that nearly all outcome variables were some form of a count. Yet, most analyses they reviewed used traditional data analysis methods designed for normally-distributed continuous data.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91216241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Methods for Examining the Psychometric Quality of Subscores: A Review and Application.","authors":"Jonathan Wedman, Per-Erik Lyrén","doi":"10.7275/NG3Q-0D19","DOIUrl":"https://doi.org/10.7275/NG3Q-0D19","url":null,"abstract":"When subscores on a test are reported to the test taker, the appropriateness of reporting them depends on whether they provide useful information above what is provided by the total score. Subscore ...","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75259214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RMP Evaluations, Course Easiness, and Grades: Are they Related?","authors":"S. A. Rizvi","doi":"10.7275/914Z-7K31","DOIUrl":"https://doi.org/10.7275/914Z-7K31","url":null,"abstract":"This paper investigates the relationship between the student evaluations of the instructors at the RateMyProfessors.com (RMP) website and the average grades awarded by those instructors. As of Spring 2012, the RMP site included evaluations of 538 full-and part-time instructors at the College of Staten Island (CSI). We selected the evaluations of the 419 instructors who taught at CSI for at least two semesters from Fall 2009 to Spring 2011 and had at least ten evaluations. This research indicates that there is a strong correlation between RMP’s overall evaluation and easiness scores. However, the perceived easiness of an instructor/course does not always result in higher grades for students. Furthermore, we found that the instructors who received high overall evaluation and easiness scores (4.0 to 5.0) at the RMP site do not necessarily award high grades. This is a very important finding as it disputes the argument that instructors receive high evaluations because they are easy or award high grades. On the other hand, instructors of the courses that are perceived to be difficult (RMP easiness score of 3.0 or less) are likely to be tough graders. However, instructors who received moderate overall evaluation and easiness scores (between 3.0 and 4.0) the RMP site had a high correlation between these scores and average grade awarded by those instructors. Finally, our research shows that the instructors in non-STEM disciplines award higher grades than the instructors in STEM disciplines. Non-STEM instructors also received higher overall evaluations than their STEM counterparts and non-STEM courses were perceived easier by the students than STEM courses.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89547310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real Cost-Benefit Analysis Is Needed in American Public Education.","authors":"Bert D. Stoneberg","doi":"10.7275/T2BA-A657","DOIUrl":"https://doi.org/10.7275/T2BA-A657","url":null,"abstract":"Public school critics often point to rising expenditures and relatively flat test scores to justify their school reform agendas. The claims are flawed because their analyses fail to account for the difference in data types between dollars (ratio) and test scores (interval). A cost-benefit analysis using dollars as a common metric for both costs and benefits can provide a good estimate of their relationship. It also acknowledges that costs and benefits are both subject to inflation. The National Center for Education Research administers a methods training program for researchers who want to know more about cost-benefit analyses on education policies and programs.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84705136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Linking Errors between Two Populations and Tests: A Case Study in International Surveys in Education.","authors":"D. Hastedt, Deana Desa","doi":"10.7275/YK4S-0A49","DOIUrl":"https://doi.org/10.7275/YK4S-0A49","url":null,"abstract":"This simulation study was prompted by the current increased interest in linking national studies to international large-scale assessments (ILSAs) such as IEA’s TIMSS, IEA’s PIRLS, and OECD’s PISA. Linkage in this scenario is achieved by including items from the international assessments in the national assessments on the premise that the average achievement scores from the latter can be linked to the international metric. In addition to raising issues associated with different testing conditions, administrative procedures, and the like, this approach also poses psychometric challenges. This paper endeavors to shed some light on the effects that can be expected, the linkage errors in particular, by countries using this practice. The ILSA selected for this simulation study was IEA TIMSS 2011, and the three countries used as the national assessment cases were Botswana, Honduras, and Tunisia, all of which participated in TIMSS 2011. The items selected as items common to the simulated national tests and the international test came from the Grade 4 TIMSS 2011 mathematics items that IEA released into the public domain after completion of this assessment. The findings of the current study show that linkage errors seemed to achieve acceptable levels if 30 or more items were used for the linkage, although the errors were still significantly higher compared to the TIMSS’ cutoffs. Comparison of the estimated country averages based on the simulated national surveys and the averages based on the international TIMSS assessment revealed only one instance across the three countries of the estimates approaching parity. Also, the percentages of students in these countries who actually reached the defined benchmarks on the TIMSS achievement scale differed significantly from the results based on TIMSS and the results for the simulated national assessments. As a conclusion, we advise against using groups of released items from international assessments in national assessments in order to link the results of the former to the latter.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74501819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Introduction to Missing Data in the Context of Differential Item Functioning.","authors":"Kathleen P Banks","doi":"10.7275/FPG0-5079","DOIUrl":"https://doi.org/10.7275/FPG0-5079","url":null,"abstract":"This article introduces practitioners and researchers to the topic of missing data in the context of differential item functioning (DIF), reviews the current literature on the issue, discusses implications of the review, and offers suggestions for future research. A total of nine studies were reviewed. All of these studies determined what effect particular missing data techniques would have on the results of certain DIF detection procedures under various conditions. The most important finding of this review involved the use of zero imputation as a missing data technique. The review shows that zero imputation can lead to inflated Type I errors, especially in cases where the examinees ability level has not been taken into consideration.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86561395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interrater Reliability in Large-Scale Assessments--Can Teachers Score National Tests Reliably without External Controls?.","authors":"Anna Lind Pantzare","doi":"10.7275/Y2EN-ZM89","DOIUrl":"https://doi.org/10.7275/Y2EN-ZM89","url":null,"abstract":"In most large-scale assessment systems a set of rather expensive external quality controls are implemented in order to guarantee the quality of interrater reliability. This study empirically examin ...","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76298766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"What Is Your Teacher Rubric? Extracting Teachers' Assessment Constructs.","authors":"Heejeong Jeong","doi":"10.7275/M3SA-P692","DOIUrl":"https://doi.org/10.7275/M3SA-P692","url":null,"abstract":"Rubrics not only document the scales and criteria of what is assessed, but can also represent the assessment construct of the developer. Rubrics display the key assessment criteria, and the simplicity or complexity of the rubric can illustrate the meaning associated with the score. For this study, five experienced teachers developed a rubric for an EFL (English as a Foreign Language) descriptive writing task. Results show that even for the same task, teachers developed different formats and styles of rubric with both similar and different criteria. The teacher rubrics were analyzed for assessment criteria, rubric type and scale type. Findings illustrate that in terms of criteria, all teacher rubrics had five areas in common: comprehension, paragraph structure, sentence structure, vocabulary, and grammar. The criteria that varied were mechanics, length, task completion, and selfcorrection. Rubric style and scales also were different among teachers. Teachers who valued global concerns (i.e., comprehension) in writing designed more general holistic rubrics, while teachers who focused more on sentence-level concerns (i.e., grammar) developed analytic rubrics with more details. The assessment construct of the teacher was shown in the rubric through assessment criteria, rubric style, and scale.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76579259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Defining and Measuring Academic Success.","authors":"Travis T. York, Charles W. Gibson, Susan Rankin","doi":"10.7275/HZ5X-TX03","DOIUrl":"https://doi.org/10.7275/HZ5X-TX03","url":null,"abstract":"Despite, and perhaps because of its amorphous nature, the term ‘academic success’ is one of the most widely used constructs in educational research and assessment within higher education. This paper conducts an analytic literature review to examine the use and operationalization of the term in multiple academic fields. Dominant definitions of the term are conceptually evaluated using Astin’s I-E-O model resulting in the proposition of a revised definition and new conceptual model of academic success. Measurements of academic success found throughout the literature are presented in accordance with the presented model of academic success. These measurements are provided with details in a user-friendly table (Appendix B). Results also indicate that grades and GPA are the most commonly used measure of academic success. Finally, recommendations are given for future research and practice to increase effective assessment of academic success.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74897889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}