S. Morris, Mike Bass, Elizabeth Howard, R. Neapolitan
{"title":"Stopping Rules for Computer Adaptive Testing When Item Banks Have Nonuniform Information","authors":"S. Morris, Mike Bass, Elizabeth Howard, R. Neapolitan","doi":"10.1080/15305058.2019.1635604","DOIUrl":"https://doi.org/10.1080/15305058.2019.1635604","url":null,"abstract":"The standard error (SE) stopping rule, which terminates a computer adaptive test (CAT) when the SE is less than a threshold, is effective when there are informative questions for all trait levels. However, in domains such as patient-reported outcomes, the items in a bank might all target one end of the trait continuum (e.g., negative symptoms), and the bank may lack depth for many individuals. In such cases, the predicted standard error reduction (PSER) stopping rule will stop the CAT even if the SE threshold has not been reached and can avoid administering excessive questions that provide little additional information. By tuning the parameters of the PSER algorithm, a practitioner can specify a desired tradeoff between accuracy and efficiency. Using simulated data for the Patient-Reported Outcomes Measurement Information System Anxiety and Physical Function banks, we demonstrate that these parameters can substantially impact CAT performance. When the parameters were optimally tuned, the PSER stopping rule was found to outperform the SE stopping rule overall, particularly for individuals not targeted by the bank, and presented roughly the same number of items across the trait continuum. Therefore, the PSER stopping rule provides an effective method for balancing the precision and efficiency of a CAT.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2020-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1635604","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43767801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ricardo Primi, Filip De Fruyt, Daniel Santos, Stephen Antonoplis, O. John
{"title":"True or False? Keying Direction and Acquiescence Influence the Validity of Socio-Emotional Skills Items in Predicting High School Achievement","authors":"Ricardo Primi, Filip De Fruyt, Daniel Santos, Stephen Antonoplis, O. John","doi":"10.1080/15305058.2019.1673398","DOIUrl":"https://doi.org/10.1080/15305058.2019.1673398","url":null,"abstract":"What type of items, keyed positively or negatively, makes social-emotional skill or personality scales more valid? The present study examines the different criterion validities of true- and false-keyed items, before and after correction for acquiescence. The sample included 12,987 children and adolescents from 425 schools of the State of São Paulo Brazil (ages 11–18 attending grades 6–12). They answered a computerized 162-item questionnaire measuring 18 facets grouped into five broad domains of social-emotional skills, i.e.: Open-mindedness (O), Conscientious Self-Management (C), Engaging with others (E), Amity (A), and Negative-Emotion Regulation (N). All facet scales were fully balanced (3 true-keyed and 3 false-keyed items per facet). Criterion validity coefficients of scales composed of only true-keyed items versus only false-keyed items were compared. The criterion measure was a standardized achievement test of language and math ability. We found that coefficients were almost as twice as big for false-keyed items’ scales than for true-keyed items’ scales. After correcting for acquiescence coefficients became more similar. Acquiescence suppresses the criterion validity of unbalanced scales composed of true-keyed items. We conclude that balanced scales with pairs of true and false keyed items make a better scale in terms of internal structural and predictive validity.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2020-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1673398","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49361168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanyan Fu, Tyler Strachan, E. Ip, John T. Willse, Shyh-Huei Chen, Terry A. Ackerman
{"title":"The Recovery of Correlation Between Latent Abilities Using Compensatory and Noncompensatory Multidimensional IRT Models","authors":"Yanyan Fu, Tyler Strachan, E. Ip, John T. Willse, Shyh-Huei Chen, Terry A. Ackerman","doi":"10.1080/15305058.2019.1692212","DOIUrl":"https://doi.org/10.1080/15305058.2019.1692212","url":null,"abstract":"This research examined correlation estimates between latent abilities when using the two-dimensional and three-dimensional compensatory and noncompensatory item response theory models. Simulation study results showed that the recovery of the latent correlation was best when the test contained 100% of simple structure items for all models and conditions. When a test measured weakly discriminated dimensions, it became harder to recover the latent correlation. Results also showed that increasing the sample size, test length, or using simpler models (i.e., two-parameter logistic rather than three-parameter logistic, compensatory rather than noncompensatory) could improve the recovery of latent correlation.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2020-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1692212","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44045191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effect of Quality Characteristics of Peer Raters on Rating Errors in Peer Assessment","authors":"Xiuyan Guo, Pui‐wa Lei","doi":"10.1080/15305058.2020.1720216","DOIUrl":"https://doi.org/10.1080/15305058.2020.1720216","url":null,"abstract":"Little research has been done on the effects of peer raters’ quality characteristics on peer rating qualities. This study aims to address this gap and investigate the effects of key variables related to peer raters’ qualities, including content knowledge, previous rating experience, training on rating tasks, and rating motivation. In an experiment where training and motivation interventions were manipulated, 24 classes with 838 high school students were randomly assigned to study conditions. Inter-rater error, intra-rater error and criterion error indices for peer ratings on four selected essays were analyzed using hierarchical linear models. Results indicated that peer raters’ content knowledge, previous rating experience, and rating motivation were associated with rating errors. This study also found some significant interactions between peer raters’ quality characteristics. Implications for in-person and online peer assessments as well as future directions are discussed.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2020-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2020.1720216","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43660947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Relationship between Response-Time Effort and Accuracy in PISA Science Multiple Choice Items","authors":"M. Michaelides, M. Ivanova, C. Nicolaou","doi":"10.1080/15305058.2019.1706529","DOIUrl":"https://doi.org/10.1080/15305058.2019.1706529","url":null,"abstract":"The study examined the relationship between examinees’ test-taking effort and their accuracy rate on items from the PISA 2015 assessment. The 10% normative threshold method was applied on Science multiple-choice items in the Cyprus sample to detect rapid guessing behavior. Results showed that the extent of rapid guessing across simple and complex multiple-choice items was on average less than 6% per item. Rapid guessers were identified, and for most items their accuracy was lower than the accuracy for students engaging in solution-based behavior. Examinees with higher overall performance on the test items tended to engage in less rapid guessing than their lower performing peers. Overall, this empirical investigation presents original evidence on test-taking effort as measured by response time in PISA items and tests propositions of Wise’s (2017) Test-Taking Theory.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2020-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1706529","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43585415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ying Cui, Qi Guo, Jacqueline P. Leighton, Man-Wai Chu
{"title":"Log Data Analysis with ANFIS: A Fuzzy Neural Network Approach","authors":"Ying Cui, Qi Guo, Jacqueline P. Leighton, Man-Wai Chu","doi":"10.1080/15305058.2018.1551225","DOIUrl":"https://doi.org/10.1080/15305058.2018.1551225","url":null,"abstract":"This study explores the use of the Adaptive Neuro-Fuzzy Inference System (ANFIS), a neuro-fuzzy approach, to analyze the log data of technology-based assessments to extract relevant features of student problem-solving processes, and develop and refine a set of fuzzy logic rules that could be used to interpret student performance. The log data that record student response processes while solving a science simulation task were analyzed with ANFIS. Results indicate the ANFIS analysis could generate and refine a set of fuzzy rules that shed lights on the process of how students solve the simulation task. We conclude the article by discussing the advantages of combining human judgments with the learning capacity of ANFIS for log data analysis and outlining the limitations of the current study and areas of future research.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2020-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1551225","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48938428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Sabatini, T. O’Reilly, Jonathan P. Weeks, Zuowei Wang
{"title":"Engineering a Twenty-First Century Reading Comprehension Assessment System Utilizing Scenario-Based Assessment Techniques","authors":"J. Sabatini, T. O’Reilly, Jonathan P. Weeks, Zuowei Wang","doi":"10.1080/15305058.2018.1551224","DOIUrl":"https://doi.org/10.1080/15305058.2018.1551224","url":null,"abstract":"The construct of reading comprehension has changed significantly in the twenty-first century; however, some test designs have not evolved sufficiently to capture these changes. Specifically, the nature of literacy sources and skills required has changed (wrought primarily by widespread use of digital technologies). Modern theories of comprehension and discourse processes have been developed to accommodate these changes, and the learning sciences have followed suit. These influences have significant implications for how we think about the development of comprehension proficiency across grades. In this paper, we describe a theoretically driven, developmentally sensitive assessment system based on a scenario-based assessment paradigm, and present evidence for its feasibility and psychometric soundness.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2020-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1551224","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47386975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The (Non)Impact of Differential Test Taker Engagement on Aggregated Scores","authors":"S. Wise, J. Soland, Y. Bo","doi":"10.1080/15305058.2019.1605999","DOIUrl":"https://doi.org/10.1080/15305058.2019.1605999","url":null,"abstract":"Disengaged test taking tends to be most prevalent with low-stakes tests. This has led to questions about the validity of aggregated scores from large-scale international assessments such as PISA and TIMSS, as previous research has found a meaningful correlation between the mean engagement and mean performance of countries. The current study, using data from the computer-based version of the PISA-Based Test for Schools, examined the distortive effects of differential engagement on aggregated school-level scores. The results showed that, although there was considerable differential engagement among schools, the school means were highly stable due to two factors. First, any distortive effects of disengagement in a school were diluted by a high proportion of the students exhibiting no non-effortful behavior. Second, and most interestingly, disengagement produced both positive and negative distortion of individual student scores, which tended to cancel out much of the net distortive effect on the school’s mean.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2020-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1605999","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47045097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scott B Morris, Michael Bass, Elizabeth Howard, Richard E Neapolitan
{"title":"Stopping Rules for Computer Adaptive Testing When Item Banks Have Nonuniform Information.","authors":"Scott B Morris, Michael Bass, Elizabeth Howard, Richard E Neapolitan","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The <i>standard error</i> (<i>SE</i>) stopping rule, which terminates a <i>computer adaptive test</i> (CAT) when the SE is less than a threshold, is effective when there are informative questions for all trait levels. However, in domains such as patient reported outcomes, the items in a bank might all target one end of the trait continuum (e.g., negative symptoms), and the bank may lack depth for many individuals. In such cases, the <i>predicted standard error reduction</i> (PSER) stopping rule will stop the CAT even if the <i>SE</i> threshold has not been reached, and can avoid administering excessive questions that provide little additional information. By tuning the parameters of the PSER algorithm, a practitioner can specify a desired tradeoff between accuracy and efficiency<i>.</i> Using simulated data for the PROMIS <i>Anxiety</i> and <i>Physical Function</i> banks, we demonstrate that these parameters can substantially impact CAT performance. When the parameters were optimally tuned, the PSER stopping rule was found to outperform the <i>SE</i> stopping rule overall and particularly for individuals not targeted by the bank, and presented roughly the same number of items across the trait continuum. Therefore, the PSER stopping rule provides an effective method for balancing the precision and efficiency of a CAT.</p>","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7518406/pdf/nihms-1534260.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38521672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ITC Guidelines for the Large-Scale Assessment of Linguistically and Culturally Diverse Populations","authors":"M. Oliveri","doi":"10.1080/15305058.2019.1631024","DOIUrl":"https://doi.org/10.1080/15305058.2019.1631024","url":null,"abstract":"These guidelines describe considerations relevant to the assessment of test takers in or across countries or regions that are linguistically or culturally diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the fair and valid assessment of linguistically or culturally diverse populations. They are meant to apply to most, if not all, aspects of the development, administration, scoring, and use of assessments; and are intended to supplement other existing professional standards or guidelines for testing and assessment. That is, these guidelines focus on the types of adaptations and considerations to use when developing, reviewing, and interpreting items and test scores from tests administered to culturally and linguistically or culturally diverse populations. Other guidelines such as the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014) or Guidelines for Best Practice in Cross-Cultural Surveys (Survey Research Center, 2016) may also be relevant to testing linguistically and culturally diverse populations.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2019-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1631024","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49265430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}