International Journal of Testing最新文献

筛选
英文 中文
Stopping Rules for Computer Adaptive Testing When Item Banks Have Nonuniform Information 题库信息不统一时计算机自适应测试的停止规则
IF 1.7
International Journal of Testing Pub Date : 2020-04-02 DOI: 10.1080/15305058.2019.1635604
S. Morris, Mike Bass, Elizabeth Howard, R. Neapolitan
{"title":"Stopping Rules for Computer Adaptive Testing When Item Banks Have Nonuniform Information","authors":"S. Morris, Mike Bass, Elizabeth Howard, R. Neapolitan","doi":"10.1080/15305058.2019.1635604","DOIUrl":"https://doi.org/10.1080/15305058.2019.1635604","url":null,"abstract":"The standard error (SE) stopping rule, which terminates a computer adaptive test (CAT) when the SE is less than a threshold, is effective when there are informative questions for all trait levels. However, in domains such as patient-reported outcomes, the items in a bank might all target one end of the trait continuum (e.g., negative symptoms), and the bank may lack depth for many individuals. In such cases, the predicted standard error reduction (PSER) stopping rule will stop the CAT even if the SE threshold has not been reached and can avoid administering excessive questions that provide little additional information. By tuning the parameters of the PSER algorithm, a practitioner can specify a desired tradeoff between accuracy and efficiency. Using simulated data for the Patient-Reported Outcomes Measurement Information System Anxiety and Physical Function banks, we demonstrate that these parameters can substantially impact CAT performance. When the parameters were optimally tuned, the PSER stopping rule was found to outperform the SE stopping rule overall, particularly for individuals not targeted by the bank, and presented roughly the same number of items across the trait continuum. Therefore, the PSER stopping rule provides an effective method for balancing the precision and efficiency of a CAT.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2020-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1635604","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43767801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
True or False? Keying Direction and Acquiescence Influence the Validity of Socio-Emotional Skills Items in Predicting High School Achievement 是真是假?键盘指向和停顿对社会情感技能项目预测高中成绩有效性的影响
IF 1.7
International Journal of Testing Pub Date : 2020-04-02 DOI: 10.1080/15305058.2019.1673398
Ricardo Primi, Filip De Fruyt, Daniel Santos, Stephen Antonoplis, O. John
{"title":"True or False? Keying Direction and Acquiescence Influence the Validity of Socio-Emotional Skills Items in Predicting High School Achievement","authors":"Ricardo Primi, Filip De Fruyt, Daniel Santos, Stephen Antonoplis, O. John","doi":"10.1080/15305058.2019.1673398","DOIUrl":"https://doi.org/10.1080/15305058.2019.1673398","url":null,"abstract":"What type of items, keyed positively or negatively, makes social-emotional skill or personality scales more valid? The present study examines the different criterion validities of true- and false-keyed items, before and after correction for acquiescence. The sample included 12,987 children and adolescents from 425 schools of the State of São Paulo Brazil (ages 11–18 attending grades 6–12). They answered a computerized 162-item questionnaire measuring 18 facets grouped into five broad domains of social-emotional skills, i.e.: Open-mindedness (O), Conscientious Self-Management (C), Engaging with others (E), Amity (A), and Negative-Emotion Regulation (N). All facet scales were fully balanced (3 true-keyed and 3 false-keyed items per facet). Criterion validity coefficients of scales composed of only true-keyed items versus only false-keyed items were compared. The criterion measure was a standardized achievement test of language and math ability. We found that coefficients were almost as twice as big for false-keyed items’ scales than for true-keyed items’ scales. After correcting for acquiescence coefficients became more similar. Acquiescence suppresses the criterion validity of unbalanced scales composed of true-keyed items. We conclude that balanced scales with pairs of true and false keyed items make a better scale in terms of internal structural and predictive validity.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2020-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1673398","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49361168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
The Recovery of Correlation Between Latent Abilities Using Compensatory and Noncompensatory Multidimensional IRT Models 利用代偿和非代偿多维IRT模型恢复潜在能力之间的相关性
IF 1.7
International Journal of Testing Pub Date : 2020-04-02 DOI: 10.1080/15305058.2019.1692212
Yanyan Fu, Tyler Strachan, E. Ip, John T. Willse, Shyh-Huei Chen, Terry A. Ackerman
{"title":"The Recovery of Correlation Between Latent Abilities Using Compensatory and Noncompensatory Multidimensional IRT Models","authors":"Yanyan Fu, Tyler Strachan, E. Ip, John T. Willse, Shyh-Huei Chen, Terry A. Ackerman","doi":"10.1080/15305058.2019.1692212","DOIUrl":"https://doi.org/10.1080/15305058.2019.1692212","url":null,"abstract":"This research examined correlation estimates between latent abilities when using the two-dimensional and three-dimensional compensatory and noncompensatory item response theory models. Simulation study results showed that the recovery of the latent correlation was best when the test contained 100% of simple structure items for all models and conditions. When a test measured weakly discriminated dimensions, it became harder to recover the latent correlation. Results also showed that increasing the sample size, test length, or using simpler models (i.e., two-parameter logistic rather than three-parameter logistic, compensatory rather than noncompensatory) could improve the recovery of latent correlation.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2020-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1692212","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44045191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Effect of Quality Characteristics of Peer Raters on Rating Errors in Peer Assessment 同伴评价者的素质特征对同伴评价者错误率的影响
IF 1.7
International Journal of Testing Pub Date : 2020-02-12 DOI: 10.1080/15305058.2020.1720216
Xiuyan Guo, Pui‐wa Lei
{"title":"Effect of Quality Characteristics of Peer Raters on Rating Errors in Peer Assessment","authors":"Xiuyan Guo, Pui‐wa Lei","doi":"10.1080/15305058.2020.1720216","DOIUrl":"https://doi.org/10.1080/15305058.2020.1720216","url":null,"abstract":"Little research has been done on the effects of peer raters’ quality characteristics on peer rating qualities. This study aims to address this gap and investigate the effects of key variables related to peer raters’ qualities, including content knowledge, previous rating experience, training on rating tasks, and rating motivation. In an experiment where training and motivation interventions were manipulated, 24 classes with 838 high school students were randomly assigned to study conditions. Inter-rater error, intra-rater error and criterion error indices for peer ratings on four selected essays were analyzed using hierarchical linear models. Results indicated that peer raters’ content knowledge, previous rating experience, and rating motivation were associated with rating errors. This study also found some significant interactions between peer raters’ quality characteristics. Implications for in-person and online peer assessments as well as future directions are discussed.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2020-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2020.1720216","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43660947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The Relationship between Response-Time Effort and Accuracy in PISA Science Multiple Choice Items PISA科学选择题反应时间努力与准确性的关系
IF 1.7
International Journal of Testing Pub Date : 2020-01-10 DOI: 10.1080/15305058.2019.1706529
M. Michaelides, M. Ivanova, C. Nicolaou
{"title":"The Relationship between Response-Time Effort and Accuracy in PISA Science Multiple Choice Items","authors":"M. Michaelides, M. Ivanova, C. Nicolaou","doi":"10.1080/15305058.2019.1706529","DOIUrl":"https://doi.org/10.1080/15305058.2019.1706529","url":null,"abstract":"The study examined the relationship between examinees’ test-taking effort and their accuracy rate on items from the PISA 2015 assessment. The 10% normative threshold method was applied on Science multiple-choice items in the Cyprus sample to detect rapid guessing behavior. Results showed that the extent of rapid guessing across simple and complex multiple-choice items was on average less than 6% per item. Rapid guessers were identified, and for most items their accuracy was lower than the accuracy for students engaging in solution-based behavior. Examinees with higher overall performance on the test items tended to engage in less rapid guessing than their lower performing peers. Overall, this empirical investigation presents original evidence on test-taking effort as measured by response time in PISA items and tests propositions of Wise’s (2017) Test-Taking Theory.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2020-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1706529","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43585415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Log Data Analysis with ANFIS: A Fuzzy Neural Network Approach 基于ANFIS的测井数据分析:一种模糊神经网络方法
IF 1.7
International Journal of Testing Pub Date : 2020-01-02 DOI: 10.1080/15305058.2018.1551225
Ying Cui, Qi Guo, Jacqueline P. Leighton, Man-Wai Chu
{"title":"Log Data Analysis with ANFIS: A Fuzzy Neural Network Approach","authors":"Ying Cui, Qi Guo, Jacqueline P. Leighton, Man-Wai Chu","doi":"10.1080/15305058.2018.1551225","DOIUrl":"https://doi.org/10.1080/15305058.2018.1551225","url":null,"abstract":"This study explores the use of the Adaptive Neuro-Fuzzy Inference System (ANFIS), a neuro-fuzzy approach, to analyze the log data of technology-based assessments to extract relevant features of student problem-solving processes, and develop and refine a set of fuzzy logic rules that could be used to interpret student performance. The log data that record student response processes while solving a science simulation task were analyzed with ANFIS. Results indicate the ANFIS analysis could generate and refine a set of fuzzy rules that shed lights on the process of how students solve the simulation task. We conclude the article by discussing the advantages of combining human judgments with the learning capacity of ANFIS for log data analysis and outlining the limitations of the current study and areas of future research.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2020-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1551225","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48938428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Engineering a Twenty-First Century Reading Comprehension Assessment System Utilizing Scenario-Based Assessment Techniques 利用情景评估技术构建21世纪阅读理解评估系统
IF 1.7
International Journal of Testing Pub Date : 2020-01-02 DOI: 10.1080/15305058.2018.1551224
J. Sabatini, T. O’Reilly, Jonathan P. Weeks, Zuowei Wang
{"title":"Engineering a Twenty-First Century Reading Comprehension Assessment System Utilizing Scenario-Based Assessment Techniques","authors":"J. Sabatini, T. O’Reilly, Jonathan P. Weeks, Zuowei Wang","doi":"10.1080/15305058.2018.1551224","DOIUrl":"https://doi.org/10.1080/15305058.2018.1551224","url":null,"abstract":"The construct of reading comprehension has changed significantly in the twenty-first century; however, some test designs have not evolved sufficiently to capture these changes. Specifically, the nature of literacy sources and skills required has changed (wrought primarily by widespread use of digital technologies). Modern theories of comprehension and discourse processes have been developed to accommodate these changes, and the learning sciences have followed suit. These influences have significant implications for how we think about the development of comprehension proficiency across grades. In this paper, we describe a theoretically driven, developmentally sensitive assessment system based on a scenario-based assessment paradigm, and present evidence for its feasibility and psychometric soundness.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2020-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1551224","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47386975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
The (Non)Impact of Differential Test Taker Engagement on Aggregated Scores 不同考生参与对总分的(非)影响
IF 1.7
International Journal of Testing Pub Date : 2020-01-02 DOI: 10.1080/15305058.2019.1605999
S. Wise, J. Soland, Y. Bo
{"title":"The (Non)Impact of Differential Test Taker Engagement on Aggregated Scores","authors":"S. Wise, J. Soland, Y. Bo","doi":"10.1080/15305058.2019.1605999","DOIUrl":"https://doi.org/10.1080/15305058.2019.1605999","url":null,"abstract":"Disengaged test taking tends to be most prevalent with low-stakes tests. This has led to questions about the validity of aggregated scores from large-scale international assessments such as PISA and TIMSS, as previous research has found a meaningful correlation between the mean engagement and mean performance of countries. The current study, using data from the computer-based version of the PISA-Based Test for Schools, examined the distortive effects of differential engagement on aggregated school-level scores. The results showed that, although there was considerable differential engagement among schools, the school means were highly stable due to two factors. First, any distortive effects of disengagement in a school were diluted by a high proportion of the students exhibiting no non-effortful behavior. Second, and most interestingly, disengagement produced both positive and negative distortion of individual student scores, which tended to cancel out much of the net distortive effect on the school’s mean.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2020-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1605999","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47045097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Stopping Rules for Computer Adaptive Testing When Item Banks Have Nonuniform Information. 题库信息不统一时计算机自适应测试的停止规则。
IF 1.7
International Journal of Testing Pub Date : 2020-01-01 Epub Date: 2019-07-16
Scott B Morris, Michael Bass, Elizabeth Howard, Richard E Neapolitan
{"title":"Stopping Rules for Computer Adaptive Testing When Item Banks Have Nonuniform Information.","authors":"Scott B Morris,&nbsp;Michael Bass,&nbsp;Elizabeth Howard,&nbsp;Richard E Neapolitan","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The <i>standard error</i> (<i>SE</i>) stopping rule, which terminates a <i>computer adaptive test</i> (CAT) when the SE is less than a threshold, is effective when there are informative questions for all trait levels. However, in domains such as patient reported outcomes, the items in a bank might all target one end of the trait continuum (e.g., negative symptoms), and the bank may lack depth for many individuals. In such cases, the <i>predicted standard error reduction</i> (PSER) stopping rule will stop the CAT even if the <i>SE</i> threshold has not been reached, and can avoid administering excessive questions that provide little additional information. By tuning the parameters of the PSER algorithm, a practitioner can specify a desired tradeoff between accuracy and efficiency<i>.</i> Using simulated data for the PROMIS <i>Anxiety</i> and <i>Physical Function</i> banks, we demonstrate that these parameters can substantially impact CAT performance. When the parameters were optimally tuned, the PSER stopping rule was found to outperform the <i>SE</i> stopping rule overall and particularly for individuals not targeted by the bank, and presented roughly the same number of items across the trait continuum. Therefore, the PSER stopping rule provides an effective method for balancing the precision and efficiency of a CAT.</p>","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7518406/pdf/nihms-1534260.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38521672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ITC Guidelines for the Large-Scale Assessment of Linguistically and Culturally Diverse Populations ITC语言和文化多样性人群大规模评估指南
IF 1.7
International Journal of Testing Pub Date : 2019-10-02 DOI: 10.1080/15305058.2019.1631024
M. Oliveri
{"title":"ITC Guidelines for the Large-Scale Assessment of Linguistically and Culturally Diverse Populations","authors":"M. Oliveri","doi":"10.1080/15305058.2019.1631024","DOIUrl":"https://doi.org/10.1080/15305058.2019.1631024","url":null,"abstract":"These guidelines describe considerations relevant to the assessment of test takers in or across countries or regions that are linguistically or culturally diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the fair and valid assessment of linguistically or culturally diverse populations. They are meant to apply to most, if not all, aspects of the development, administration, scoring, and use of assessments; and are intended to supplement other existing professional standards or guidelines for testing and assessment. That is, these guidelines focus on the types of adaptations and considerations to use when developing, reviewing, and interpreting items and test scores from tests administered to culturally and linguistically or culturally diverse populations. Other guidelines such as the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014) or Guidelines for Best Practice in Cross-Cultural Surveys (Survey Research Center, 2016) may also be relevant to testing linguistically and culturally diverse populations.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2019-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1631024","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49265430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信