International Journal of Testing最新文献

筛选
英文 中文
Using Evidence-Centered Design to Support the Development of Culturally and Linguistically Sensitive Collaborative Problem-Solving Assessments 使用以证据为中心的设计来支持文化和语言敏感的协作问题解决评估的发展
IF 1.7
International Journal of Testing Pub Date : 2019-01-29 DOI: 10.1080/15305058.2018.1543308
M. Oliveri, René Lawless, R. Mislevy
{"title":"Using Evidence-Centered Design to Support the Development of Culturally and Linguistically Sensitive Collaborative Problem-Solving Assessments","authors":"M. Oliveri, René Lawless, R. Mislevy","doi":"10.1080/15305058.2018.1543308","DOIUrl":"https://doi.org/10.1080/15305058.2018.1543308","url":null,"abstract":"Collaborative problem solving (CPS) ranks among the top five most critical skills necessary for college graduates to meet workforce demands (Hart Research Associates, 2015). It is also deemed a critical skill for educational success (Beaver, 2013). It thus deserves more prominence in the suite of courses and subjects assessed in K-16. Such inclusion, however, presents the need for improvements in the conceptualization, design, and analysis of CPS, which challenges us to think differently about assessing the skills than the current focus given to assessing individuals’ substantive knowledge. In this article, we discuss an Evidence-Centered Design approach to assess CPS in a culturally and linguistically diverse educational environment. We demonstrate ways to consider a sociocognitive perspective to conceptualize and model possible linguistic and/or cultural differences between populations along key stages of assessment development including assessment conceptualization and design to help reduce possible construct-irrelevant differences when assessing complex constructs with diverse populations.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2019-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1543308","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44350922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Assessment of University Students’ Critical Thinking: Next Generation Performance Assessment 大学生批判性思维评价:下一代绩效评价
IF 1.7
International Journal of Testing Pub Date : 2019-01-24 DOI: 10.1080/15305058.2018.1543309
R. Shavelson, O. Zlatkin‐Troitschanskaia, K. Beck, Susanne Schmidt, Julián P. Mariño
{"title":"Assessment of University Students’ Critical Thinking: Next Generation Performance Assessment","authors":"R. Shavelson, O. Zlatkin‐Troitschanskaia, K. Beck, Susanne Schmidt, Julián P. Mariño","doi":"10.1080/15305058.2018.1543309","DOIUrl":"https://doi.org/10.1080/15305058.2018.1543309","url":null,"abstract":"Following employers’ criticisms and recent societal developments, policymakers and educators have called for students to develop a range of generic skills such as critical thinking (“twenty-first century skills”). So far, such skills have typically been assessed by student self-reports or with multiple-choice tests. An alternative approach is criterion-sampling measurement. This approach leads to developing performance assessments using “criterion” tasks, which are drawn from real-world situations in which students are being educated, both within and across academic or professional domains. One current project, iPAL (The international Performance Assessment of Learning), consolidates previous research and focuses on the next generation performance assessments. In this paper, we present iPAL’s assessment framework and show how it guides the development of such performance assessments, exemplify these assessments with a concrete task, and provide preliminary evidence of its reliability and validity, which allows us to draw initial implications for further test design and development.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2019-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1543309","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48194695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 65
An Examination of Different Methods of Setting Cutoff Values in Person Fit Research 对适合度研究中设定临界值的不同方法的考察
IF 1.7
International Journal of Testing Pub Date : 2019-01-02 DOI: 10.1080/15305058.2018.1464010
A. Mousavi, Ying Cui, Todd Rogers
{"title":"An Examination of Different Methods of Setting Cutoff Values in Person Fit Research","authors":"A. Mousavi, Ying Cui, Todd Rogers","doi":"10.1080/15305058.2018.1464010","DOIUrl":"https://doi.org/10.1080/15305058.2018.1464010","url":null,"abstract":"This simulation study evaluates four different methods of setting cutoff values for person fit assessment, including (a) using fixed cutoff values either from theoretical distributions of person fit statistics, or arbitrarily chosen by the researchers in the literature; (b) using the specific percentile rank of empirical sampling distribution of person fit statistics from simulated fitting responses; (c) using bootstrap method to estimate cutoff values of empirical sampling distribution of person fit statistics from simulated fitting responses; and (d) using the p-value methods to identify misfitting responses conditional on ability levels. The Snijders' (2001), as an index with known theoretical distribution, van der Flier's U3 (1982) and Sijtsma's HT coefficient (1986), as indices with unknown theoretical distribution, were chosen. According to the simulation results, different methods of setting cutoff values tend to produce different levels of Type I error and detection rates, indicating it is critical to select an appropriate method for setting cutoff values in person fit research.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2019-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1464010","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48532510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Comparison of the Relative Performance of Four IRT Models on Equating Passage-Based Tests 四种IRT模型在等价段落测试中的相对性能比较
IF 1.7
International Journal of Testing Pub Date : 2018-12-13 DOI: 10.1080/15305058.2018.1530239
Kyung Yong Kim, Euijin Lim, Won‐Chan Lee
{"title":"A Comparison of the Relative Performance of Four IRT Models on Equating Passage-Based Tests","authors":"Kyung Yong Kim, Euijin Lim, Won‐Chan Lee","doi":"10.1080/15305058.2018.1530239","DOIUrl":"https://doi.org/10.1080/15305058.2018.1530239","url":null,"abstract":"For passage-based tests, items that belong to a common passage often violate the local independence assumption of unidimensional item response theory (UIRT). In this case, ignoring local item dependence (LID) and estimating item parameters using a UIRT model could be problematic because doing so might result in inaccurate parameter estimates, which, in turn, could impact the results of equating. Under the random groups design, the main purpose of this article was to compare the relative performance of the three-parameter logistic (3PL), graded response (GR), bifactor, and testlet models on equating passage-based tests when various degrees of LID were present due to passage. Simulation results showed that the testlet model produced the most accurate equating results, followed by the bifactor model. The 3PL model worked as well as the bifactor and testlet models when the degree of LID was low but returned less accurate equating results than the two multidimensional models as the degree of LID increased. Among the four models, the polytomous GR model provided the least accurate equating results.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2018-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1530239","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46453114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Test Instructions Do Not Moderate the Indirect Effect of Perceived Test Importance on Test Performance in Low-Stakes Testing Contexts 在低风险测试环境中,测试说明不能缓和感知测试重要性对测试性能的间接影响
IF 1.7
International Journal of Testing Pub Date : 2018-10-02 DOI: 10.1080/15305058.2017.1396466
S. Finney, Aaron J. Myers, C. Mathers
{"title":"Test Instructions Do Not Moderate the Indirect Effect of Perceived Test Importance on Test Performance in Low-Stakes Testing Contexts","authors":"S. Finney, Aaron J. Myers, C. Mathers","doi":"10.1080/15305058.2017.1396466","DOIUrl":"https://doi.org/10.1080/15305058.2017.1396466","url":null,"abstract":"Assessment specialists expend a great deal of energy to promote valid inferences from test scores gathered in low-stakes testing contexts. Given the indirect effect of perceived test importance on test performance via examinee effort, assessment practitioners have manipulated test instructions with the goal of increasing perceived test importance. Importantly, no studies have investigated the impact of test instructions on this indirect effect. In the current study, students were randomly assigned to one of three test instruction conditions intended to increase test relevance while keeping the test low-stakes to examinees. Test instructions did not impact average perceived test importance, examinee effort, or test performance. Furthermore, the indirect relationship between importance and performance via effort was not moderated by instructions. Thus, the effect of perceived test importance on test scores via expended effort appears consistent across different messages regarding the personal relevance of the test to examinees. The main implication for testing practice is that the effect of instructions may be negligible when reflective of authentic low-stakes test score use. Future studies should focus on uncovering instructions that increase the value of performance to the examinee yet remain truthful regarding score use.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2017.1396466","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49293024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Investigating the Reliability of the Sentence Verification Technique 句子验证技术的可靠性研究
IF 1.7
International Journal of Testing Pub Date : 2018-09-20 DOI: 10.1080/15305058.2018.1497636
Amanda M Marcotte, Francis Rick, C. Wells
{"title":"Investigating the Reliability of the Sentence Verification Technique","authors":"Amanda M Marcotte, Francis Rick, C. Wells","doi":"10.1080/15305058.2018.1497636","DOIUrl":"https://doi.org/10.1080/15305058.2018.1497636","url":null,"abstract":"Reading comprehension plays an important role in achievement for all academic domains. The purpose of this study is to describe the sentence verification technique (SVT) (Royer, Hastings, & Hook, 1979) as an alternative method of assessing reading comprehension, which can be used with a variety of texts and across diverse populations and educational contexts. Additionally, this study adds a unique contribution to the extant literature on the SVT through an investigation of the precision of the instrument across proficiency levels. Data were gathered from a sample of 464 fourth-grade students from the Northeast region of the United States. Reliability was estimated using one, two, three, and four passage test forms. Two or three passages provided sufficient reliability. The conditional reliability analyses revealed that the SVT test scores were reliable for readers with average to below average proficiency, but did not provide reliable information for students who were very poor or strong readers.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2018-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1497636","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45868181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Item Parameter Drift in Context Questionnaires from International Large-Scale Assessments 国际大型评估问卷中项目参数的漂移
IF 1.7
International Journal of Testing Pub Date : 2018-09-14 DOI: 10.1080/15305058.2018.1481852
HyeSun Lee, K. Geisinger
{"title":"Item Parameter Drift in Context Questionnaires from International Large-Scale Assessments","authors":"HyeSun Lee, K. Geisinger","doi":"10.1080/15305058.2018.1481852","DOIUrl":"https://doi.org/10.1080/15305058.2018.1481852","url":null,"abstract":"The purpose of the current study was to examine the impact of item parameter drift (IPD) occurring in context questionnaires from an international large-scale assessment and determine the most appropriate way to address IPD. Focusing on the context of psychometric and educational research where scores from context questionnaires composed of polytomous items were employed for the classification of examinees, the current research investigated the impacts of IPD on the estimation of questionnaire scores and classification accuracy with five manipulated factors: the length of a questionnaire, the proportion of items exhibiting IPD, the direction and magnitude of IPD, and three decisions about IPD. The results indicated that the impact of IPD occurring in a short context questionnaire on the accuracy of score estimation and classification of examinees was substantial. The accuracy in classification considerably decreased especially at the lowest and highest categories of a trait. Unlike the recommendation from literature in educational testing, the current study demonstrated that keeping items exhibiting IPD and removing them only for transformation were appropriate when IPD occurred in relatively short context questionnaires. Using 2011 TIMSS data from Iran, an applied example demonstrated the application of provided guidance in making appropriate decisions about IPD.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2018-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1481852","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42801965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Investigating the Comparability of Examination Difficulty Using Comparative Judgement and Rasch Modelling 运用比较判断和Rasch模型研究考试难度的可比性
IF 1.7
International Journal of Testing Pub Date : 2018-09-14 DOI: 10.1080/15305058.2018.1486316
Stephen D. Holmes, M. Meadows, I. Stockford, Qingping He
{"title":"Investigating the Comparability of Examination Difficulty Using Comparative Judgement and Rasch Modelling","authors":"Stephen D. Holmes, M. Meadows, I. Stockford, Qingping He","doi":"10.1080/15305058.2018.1486316","DOIUrl":"https://doi.org/10.1080/15305058.2018.1486316","url":null,"abstract":"The relationship of expected and actual difficulty of items on six mathematics question papers designed for 16-year olds in England was investigated through paired comparison using experts and testing with students. A variant of the Rasch model was applied to the comparison data to establish a scale of expected difficulty. In testing, the papers were taken by 2933 students using an equivalent-groups design, allowing the actual difficulty of the items to be placed on the same measurement scale. It was found that the expected difficulty derived using the comparative judgement approach and the actual difficulty derived from the test data was reasonably strongly correlated. This suggests that comparative judgement may be an effective way to investigate the comparability of difficulty of examinations. The approach could potentially be used as a proxy for pretesting high-stakes tests in situations where pretesting is not feasible due to reasons of security or other risks.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2018-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1486316","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45405533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Analyzing Job Analysis Data Using Mixture Rasch Models 使用混合Rasch模型分析工作分析数据
IF 1.7
International Journal of Testing Pub Date : 2018-09-14 DOI: 10.1080/15305058.2018.1481853
Adam E. Wyse
{"title":"Analyzing Job Analysis Data Using Mixture Rasch Models","authors":"Adam E. Wyse","doi":"10.1080/15305058.2018.1481853","DOIUrl":"https://doi.org/10.1080/15305058.2018.1481853","url":null,"abstract":"An important piece of validity evidence to support the use of credentialing exams comes from performing a job analysis of the profession. One common job analysis method is the task inventory method, where people working in the field are surveyed using rating scales about the tasks thought necessary to safely and competently perform the job. This article describes how mixture Rasch models can be used to analyze these data, and how results from these analyses can help to identify whether different groups of people may be responding to job tasks differently. Three examples from different credentialing programs illustrate scenarios that can be found when applying mixture Rasch models to job analysis data. Discussion of what these results may imply for the development of credentialing exams and other analyses of job analysis data is provided.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2018-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1481853","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47874147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Polytomous Model of Cognitive Diagnostic Assessment for Graded Data 分级数据认知诊断评估的多元体模型
IF 1.7
International Journal of Testing Pub Date : 2018-07-03 DOI: 10.1080/15305058.2017.1396465
Dongbo Tu, Chanjin Zheng, Yan Cai, Xuliang Gao, Daxun Wang
{"title":"A Polytomous Model of Cognitive Diagnostic Assessment for Graded Data","authors":"Dongbo Tu, Chanjin Zheng, Yan Cai, Xuliang Gao, Daxun Wang","doi":"10.1080/15305058.2017.1396465","DOIUrl":"https://doi.org/10.1080/15305058.2017.1396465","url":null,"abstract":"Pursuing the line of the difference models in IRT (Thissen & Steinberg, 1986), this article proposed a new cognitive diagnostic model for graded/polytomous data based on the deterministic input, noisy, and gate (Haertel, 1989; Junker & Sijtsma, 2001), which is named the DINA model for graded data (DINA-GD). We investigated the performance of a full Bayesian estimation of the proposed model. In the simulation, the classification accuracy and item recovery for the DINA-GD model were investigated. The results indicated that the proposed model had acceptable examinees' correct attribute classification rate and item parameter recovery. In addition, a real-data example was used to illustrate the application of this new model with the graded data or polytomously scored items.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2018-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2017.1396465","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49274990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信