{"title":"Negatively-Worded Multiple Choice Questions: An Avoidable Threat to Validity.","authors":"N. Chiavaroli","doi":"10.7275/5VVY-8613","DOIUrl":"https://doi.org/10.7275/5VVY-8613","url":null,"abstract":"Despite the majority of MCQ writing guides discouraging the use of negatively-worded multiple choice questions (NWQs), they continue to be regularly used both in locally produced examinations and commercially available questions. There are several reasons why the use of NWQs may prove resistant to sound pedagogical advice. Nevertheless, systematic inspection of item-level analysis often reveals anomalous behavior of NWQs on high-stakes examinations, due to otherwise highperforming students selecting the incorrect option for those questions. Highlighting the negative term as commonly recommended does not prevent this, since both anecdotal and empirical evidence suggests that many students answer the question as if it were positively phrased. The continued use of NWQs in high-stakes examinations poses a significant threat to the validity of interpretation based on these assessments. This is a form of ‘construct-irrelevant variance’ within the control of the item writer, and is therefore completely avoidable.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76929701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Constructing multiple-choice items to measure higher-order thinking","authors":"Darina Scully","doi":"10.7275/CA7Y-MM27","DOIUrl":"https://doi.org/10.7275/CA7Y-MM27","url":null,"abstract":"Across education, certification and licensure, there are repeated calls for the development of \u0000assessments that target higher-order thinking, as opposed to mere recall of facts. A common assumption \u0000is that this necessitates the use of constructed response or essay-style test questions; however, \u0000empirical evidence suggests that this may not be the case. In this paper, it is argued that multiplechoice items have the capacity to assess certain higher-order skills. In addition, a series of practical \u0000recommendations for test developers seeking to purposefully construct such items is provided.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74879840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Use of Reddit as an Inexpensive Source for High-Quality Data","authors":"Matthew R. Jamnik, D. J. Lane","doi":"10.7275/SWGT-RJ52","DOIUrl":"https://doi.org/10.7275/SWGT-RJ52","url":null,"abstract":"Today, researchers have the ability to conduct their investigations in a number of different manners, including both traditional testing using university subject pool participants and the more recent method of online recruitment. Although the use of internet participants is becoming more popular, this area of research is still very much in its infancy and needs further examination. Additionally, alternative web-based platforms need to be investigated because much of the literature has focused on using Amazon.com’s Mechanical Turk (MTurk). Therefore, the current study recruited an internet population using the website Reddit, and compared them to a traditional undergraduate sample to learn more about this web-based platform. The results demonstrated similarities and distinctions between the two samples. Furthermore, previous findings in the psychological well-being literature were replicated. As a whole, the participants recruited from Reddit provided high-quality data that were inexpensive and comparable to the responses gathered using undergraduate participants. We conclude that this website appears to be a promising tool for the field of psychological assessment, research, and evaluation.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90018097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A note on using eigenvalues in dimensionality assessment","authors":"Cengiz Zopluoglu, Ernest C Davenport","doi":"10.7275/E7GH-0785","DOIUrl":"https://doi.org/10.7275/E7GH-0785","url":null,"abstract":"","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87390666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan Liu, B. Zumbo, P. Gustafson, Yi Huang, Edward Kroc, Amery Wu
{"title":"Investigating Causal DIF via Propensity Score Methods.","authors":"Yan Liu, B. Zumbo, P. Gustafson, Yi Huang, Edward Kroc, Amery Wu","doi":"10.7275/EWQZ-N963","DOIUrl":"https://doi.org/10.7275/EWQZ-N963","url":null,"abstract":"A variety of differential item functioning (DIF) methods have been proposed and used for ensuring that a test is fair to all test takers in a target population in the situations of, for example, a test being translated to other languages. However, once a method flags an item as DIF, it is difficult to conclude that the grouping variable (e.g., test language) is responsible for the DIF result because there may exist many confounding variables that lead to the DIF result. The present study aims to (i) demonstrate the application of propensity score methods in psychometric research on DIF for dayto-day researchers, and (ii) describe conditional logistic regression for matched data in a DIF context. Propensity score methods can help to achieve the comparability between different populations or groups with respect to participants’ pre-test differences, which can assist in examining the validity of making a causal claim with regard to DIF.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78205873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Partial Least Squares Structural Equation Modeling with R.","authors":"Hamdollah Ravand, Purya Baghaei","doi":"10.7275/D2FA-QV48","DOIUrl":"https://doi.org/10.7275/D2FA-QV48","url":null,"abstract":"Structural equation modeling (SEM) has become widespread in educational and psychological research. Its flexibility in addressing complex theoretical models and the proper treatment of measurement error has made it the model of choice for many researchers in the social sciences. Nevertheless, the model imposes some daunting assumptions and restrictions (e.g. normality and relatively large sample sizes) that could discourage practitioners from applying the model. Partial least squares SEM (PLS-SEM) is a nonparametric technique which makes no distributional assumptions and can be estimated with small sample sizes. In this paper a general introduction to PLS-SEM is given and is compared with conventional SEM. Next, step by step procedures, along with R functions, are presented to estimate the model. A data set is analyzed and the outputs are interpreted.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90207681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accuracy of Bayes and Logistic Regression Subscale Probabilities for Educational and Certification Tests.","authors":"Lawrence M. Rudner","doi":"10.7275/Q7ZZ-D655","DOIUrl":"https://doi.org/10.7275/Q7ZZ-D655","url":null,"abstract":"In the machine learning literature, it is commonly accepted as fact that as calibration sample sizes increase, Naïve Bayes classifiers initially outperform Logistic Regression classifiers in terms of classification accuracy. Applied to subtests from an on-line final examination and from a highly regarded certification examination, this study shows that the conclusion also applies to the probabilities estimated from short subtests of mental abilities and that small samples can yield excellent accuracy. The calculated Bayes probabilities can be used to provide meaningful examinee feedback regardless of whether the test was originally designed to be unidimensional.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89498377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Regularization Methods for Fitting Linear Models with Small Sample Sizes: Fitting the Lasso Estimator Using R.","authors":"W. H. Finch, M. Finch","doi":"10.7275/JR3D-CQ04","DOIUrl":"https://doi.org/10.7275/JR3D-CQ04","url":null,"abstract":"Researchers and data analysts are sometimes faced with the problem of very small samples, where the number of variables approaches or exceeds the overall sample size; i.e. high dimensional data. In such cases, standard statistical models such as regression or analysis of variance cannot be used, either because the resulting parameter estimates exhibit very high variance and can therefore not be trusted, or because the statistical algorithm cannot converge on parameter estimates at all. There exist an alternative set of model estimation procedures, known collectively as regularization methods, which can be used in such circumstances, and which have been shown through simulation research to yield accurate parameter estimates. The purpose of this paper is to describe, for those unfamiliar with them, the most popular of these regularization methods, the lasso, and to demonstrate its use on an actual high dimensional dataset involving adults with autism, using the R software language. Results of analyses involving relating measures of executive functioning with a full scale intelligence test score are presented, and implications of using these models are discussed.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73574099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comparison of Three Approaches to Correct for Direct and Indirect Range Restrictions: A Simulation Study.","authors":"A. Pfaffel, Barbara Schober, C. Spiel","doi":"10.7275/X4EP-FV42","DOIUrl":"https://doi.org/10.7275/X4EP-FV42","url":null,"abstract":"A common methodological problem in the evaluation of the predictive validity of selection methods, e.g. in educational and employment selection, is that the correlation between predictor and criterion is biased. Thorndike’s (1949) formulas are commonly used to correct for this biased correlation. An alternative approach is to view the selection mechanism as a missing data mechanism. The aim of this study was to compare Thorndike’s formulas for direct and indirect range restriction scenarios with two state-of-the-art approaches for handling missing data: full information maximum likelihood (FIML) and multiple imputation by chained equations (MICE). We conducted Monte-Carlo simulations to investigate the accuracy of the population correlation estimates in dependence of the selection ratio and the true population correlation in an experimental design. For a direct range restriction scenario, the three approaches are equally accurate. For an indirect range restriction scenario, the corrections using FIML and MICE are more precise than when using Thorndike’s formula. The higher the selection ratio and the true population correlation, the higher the precision of the population correlation estimates. Our findings indicate that both missing data approaches are alternative corrections to Thorndike’s formulas, especially in the case of indirect range restriction.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74165017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Confidence Intervals for Effect Sizes: Applying Bootstrap Resampling.","authors":"Erin S. Banjanovic, J. Osborne","doi":"10.7275/DZ3R-8N08","DOIUrl":"https://doi.org/10.7275/DZ3R-8N08","url":null,"abstract":"","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74067969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}