Journal of Educational and Behavioral Statistics最新文献

筛选
英文 中文
Forced-Choice Ranking Models for Raters’ Ranking Data 评分者排名数据的强迫选择排名模型
IF 2.4 3区 心理学
Journal of Educational and Behavioral Statistics Pub Date : 2022-07-07 DOI: 10.3102/10769986221104207
Su-Pin Hung, Hung-Yu Huang
{"title":"Forced-Choice Ranking Models for Raters’ Ranking Data","authors":"Su-Pin Hung, Hung-Yu Huang","doi":"10.3102/10769986221104207","DOIUrl":"https://doi.org/10.3102/10769986221104207","url":null,"abstract":"To address response style or bias in rating scales, forced-choice items are often used to request that respondents rank their attitudes or preferences among a limited set of options. The rating scales used by raters to render judgments on ratees’ performance also contribute to rater bias or errors; consequently, forced-choice items have recently been employed for raters to rate how a ratee performs in certain defined traits. This study develops forced-choice ranking models (FCRMs) for data analysis when performance is evaluated by external raters or experts in a forced-choice ranking format. The proposed FCRMs consider different degrees of raters’ leniency/severity when modeling the selection probability in the generalized unfolding item response theory framework. They include an additional topic facet when multiple tasks are evaluated and incorporate variations in leniency parameters to capture the interactions between ratees and raters. The simulation results indicate that the parameters of the new models can be satisfactorily recovered and that better parameter recovery is associated with more item blocks, larger sample sizes, and a complete ranking design. A technological creativity assessment is presented as an empirical example with which to demonstrate the applicability and implications of the new models.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"47 1","pages":"603 - 634"},"PeriodicalIF":2.4,"publicationDate":"2022-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46199623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Assessing Inter-rater Reliability With Heterogeneous Variance Components Models: Flexible Approach Accounting for Contextual Variables 异质方差分量模型评估评分者间可靠性:考虑上下文变量的灵活方法
IF 2.4 3区 心理学
Journal of Educational and Behavioral Statistics Pub Date : 2022-07-05 DOI: 10.3102/10769986221150517
Patrícia Martinková, František Bartoš, M. Brabec
{"title":"Assessing Inter-rater Reliability With Heterogeneous Variance Components Models: Flexible Approach Accounting for Contextual Variables","authors":"Patrícia Martinková, František Bartoš, M. Brabec","doi":"10.3102/10769986221150517","DOIUrl":"https://doi.org/10.3102/10769986221150517","url":null,"abstract":"Inter-rater reliability (IRR), which is a prerequisite of high-quality ratings and assessments, may be affected by contextual variables, such as the rater’s or ratee’s gender, major, or experience. Identification of such heterogeneity sources in IRR is important for the implementation of policies with the potential to decrease measurement error and to increase IRR by focusing on the most relevant subgroups. In this study, we propose a flexible approach for assessing IRR in cases of heterogeneity due to covariates by directly modeling differences in variance components. We use Bayes factors (BFs) to select the best performing model, and we suggest using Bayesian model averaging as an alternative approach for obtaining IRR and variance component estimates, allowing us to account for model uncertainty. We use inclusion BFs considering the whole model space to provide evidence for or against differences in variance components due to covariates. The proposed method is compared with other Bayesian and frequentist approaches in a simulation study, and we demonstrate its superiority in some situations. Finally, we provide real data examples from grant proposal peer review, demonstrating the usefulness of this method and its flexibility in the generalization of more complex designs.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"48 1","pages":"349 - 383"},"PeriodicalIF":2.4,"publicationDate":"2022-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46141747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Pooling Interactions Into Error Terms in Multisite Experiments 在多站点实验中将相互作用合并为误差项
IF 2.4 3区 心理学
Journal of Educational and Behavioral Statistics Pub Date : 2022-07-04 DOI: 10.3102/10769986221104800
Wendy Chan, L. Hedges
{"title":"Pooling Interactions Into Error Terms in Multisite Experiments","authors":"Wendy Chan, L. Hedges","doi":"10.3102/10769986221104800","DOIUrl":"https://doi.org/10.3102/10769986221104800","url":null,"abstract":"Multisite field experiments using the (generalized) randomized block design that assign treatments to individuals within sites are common in education and the social sciences. Under this design, there are two possible estimands of interest and they differ based on whether sites or blocks have fixed or random effects. When the average treatment effect is assumed to be identical across sites, it is common to omit site by treatment interactions and “pool” them into the error term in classical experimental design. However, prior work has not addressed the consequences of pooling when site by treatment interactions are not zero. This study assesses the impact of pooling on inference in the presence of nonzero site by treatment interactions. We derive the small sample distributions of the test statistics for treatment effects under pooling and illustrate the impacts on rejection rates when interactions are not zero. We use the results to offer recommendations to researchers conducting studies based on the multisite design.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"47 1","pages":"639 - 665"},"PeriodicalIF":2.4,"publicationDate":"2022-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44462932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improving Accuracy and Stability of Aggregate Student Growth Measures Using Empirical Best Linear Prediction 使用经验最佳线性预测提高学生成长总量测量的准确性和稳定性
IF 2.4 3区 心理学
Journal of Educational and Behavioral Statistics Pub Date : 2022-06-27 DOI: 10.3102/10769986221101624
J. R. Lockwood, K. Castellano, D. McCaffrey
{"title":"Improving Accuracy and Stability of Aggregate Student Growth Measures Using Empirical Best Linear Prediction","authors":"J. R. Lockwood, K. Castellano, D. McCaffrey","doi":"10.3102/10769986221101624","DOIUrl":"https://doi.org/10.3102/10769986221101624","url":null,"abstract":"Many states and school districts in the United States use standardized test scores to compute annual measures of student achievement progress and then use school-level averages of these growth measures for various reporting and diagnostic purposes. These aggregate growth measures can vary consequentially from year to year for the same school, complicating their use and interpretation. We develop a method, based on the theory of empirical best linear prediction, to improve the accuracy and stability of aggregate growth measures by pooling information across grades, years, and tested subjects for individual schools. We demonstrate the performance of the method using both simulation and application to 6 years of annual growth measures from a large, urban school district. We provide code for implementing the method in the package schoolgrowth for the R environment.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"47 1","pages":"544 - 575"},"PeriodicalIF":2.4,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46997238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Speed–Accuracy Trade-Off? Not So Fast: Marginal Changes in Speed Have Inconsistent Relationships With Accuracy in Real-World Settings 速度-精度权衡?不那么快:在现实世界中,速度的边际变化与准确性之间存在不一致的关系
IF 2.4 3区 心理学
Journal of Educational and Behavioral Statistics Pub Date : 2022-06-08 DOI: 10.3102/10769986221099906
B. Domingue, K. Kanopka, B. Stenhaug, M. Sulik, Tanesia Beverly, Matthieu J. S. Brinkhuis, Ruhan Circi, Jessica Faul, Dandan Liao, Bruce McCandliss, Jelena Obradović, Chris Piech, Tenelle Porter, Project iLEAD Consortium, J. Soland, Jon Weeks, S. Wise, Jason D Yeatman
{"title":"Speed–Accuracy Trade-Off? Not So Fast: Marginal Changes in Speed Have Inconsistent Relationships With Accuracy in Real-World Settings","authors":"B. Domingue, K. Kanopka, B. Stenhaug, M. Sulik, Tanesia Beverly, Matthieu J. S. Brinkhuis, Ruhan Circi, Jessica Faul, Dandan Liao, Bruce McCandliss, Jelena Obradović, Chris Piech, Tenelle Porter, Project iLEAD Consortium, J. Soland, Jon Weeks, S. Wise, Jason D Yeatman","doi":"10.3102/10769986221099906","DOIUrl":"https://doi.org/10.3102/10769986221099906","url":null,"abstract":"The speed–accuracy trade-off (SAT) suggests that time constraints reduce response accuracy. Its relevance in observational settings—where response time (RT) may not be constrained but respondent speed may still vary—is unclear. Using 29 data sets containing data from cognitive tasks, we use a flexible method for identification of the SAT (which we test in extensive simulation studies) to probe whether the SAT holds. We find inconsistent relationships between time and accuracy; marginal increases in time use for an individual do not necessarily predict increases in accuracy. Additionally, the speed–accuracy relationship may depend on the underlying difficulty of the interaction. We also consider the analysis of items and individuals; of particular interest is the observation that respondents who exhibit more within-person variation in response speed are typically of lower ability. We further find that RT is typically a weak predictor of response accuracy. Our findings document a range of empirical phenomena that should inform future modeling of RTs collected in observational settings.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"47 1","pages":"576 - 602"},"PeriodicalIF":2.4,"publicationDate":"2022-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48011887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
What Is Actually Equated in “Test Equating”? A Didactic Note 在“测试等价”中,什么是真正等价的?说教笔记
IF 2.4 3区 心理学
Journal of Educational and Behavioral Statistics Pub Date : 2022-06-01 DOI: 10.3102/10769986211072308
Wim J. van der Linden
{"title":"What Is Actually Equated in “Test Equating”? A Didactic Note","authors":"Wim J. van der Linden","doi":"10.3102/10769986211072308","DOIUrl":"https://doi.org/10.3102/10769986211072308","url":null,"abstract":"The current literature on test equating generally defines it as the process necessary to obtain score comparability between different test forms. The definition is in contrast with Lord’s foundational paper which viewed equating as the process required to obtain comparability of measurement scale between forms. The distinction between the notions of scale and score is not trivial. The difference is explained by connecting these notions with standard statistical concepts as probability experiment, sample space, and random variable. The probability experiment underlying equating test forms with random scores immediately gives us the equating transformation as a function mapping the scale of one form into the other and thus supports the point of view taken by Lord. However, both Lord’s view and the current literature appear to rely on the idea of an experiment with random examinees which implies a different notion of test scores. It is shown how an explicit choice between the two experiments is not just important for our theoretical understanding of key notions in test equating but also has important practical consequences.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"47 1","pages":"353 - 362"},"PeriodicalIF":2.4,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47910828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Two Statistical Tests for the Detection of Item Compromise 检测项目折衷的两个统计检验
IF 2.4 3区 心理学
Journal of Educational and Behavioral Statistics Pub Date : 2022-05-11 DOI: 10.3102/10769986221094789
W. van der Linden
{"title":"Two Statistical Tests for the Detection of Item Compromise","authors":"W. van der Linden","doi":"10.3102/10769986221094789","DOIUrl":"https://doi.org/10.3102/10769986221094789","url":null,"abstract":"Two independent statistical tests of item compromise are presented, one based on the test takers’ responses and the other on their response times (RTs) on the same items. The tests can be used to monitor an item in real time during online continuous testing but are also applicable as part of post hoc forensic analysis. The two test statistics are simple intuitive quantities as the sum of the responses and RTs observed for the test takers on the item. Common features of the tests are ease of interpretation and computational simplicity. Both tests are uniformly most powerful under the assumption of known ability and speed parameters for the test takers. Examples of power functions for items with realistic parameter values suggest maximum power for 20–30 test takers with item preknowledge for the response-based test and 10–20 test takers for the RT-based test.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"47 1","pages":"485 - 504"},"PeriodicalIF":2.4,"publicationDate":"2022-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44049872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Critical View on the NEAT Equating Design: Statistical Modeling and Identifiability Problems 关于NEAT等式设计的批判性观点:统计建模和可识别性问题
IF 2.4 3区 心理学
Journal of Educational and Behavioral Statistics Pub Date : 2022-04-29 DOI: 10.3102/10769986221090609
Ernesto San Martín, Jorge González
{"title":"A Critical View on the NEAT Equating Design: Statistical Modeling and Identifiability Problems","authors":"Ernesto San Martín, Jorge González","doi":"10.3102/10769986221090609","DOIUrl":"https://doi.org/10.3102/10769986221090609","url":null,"abstract":"The nonequivalent groups with anchor test (NEAT) design is widely used in test equating. Under this design, two groups of examinees are administered different test forms with each test form containing a subset of common items. Because test takers from different groups are assigned only one test form, missing score data emerge by design rendering some of the score distributions unavailable. The partially observed score data formally lead to an identifiability problem, which has not been recognized as such in the equating literature and has been considered from different perspectives, all of them making different assumptions in order to estimate the unidentified score distributions. In this article, we formally specify the statistical model underlying the NEAT design and unveil the lack of identifiability of the parameters of interest that compose the equating transformation. We use the theory of partial identification to show alternatives to traditional practices that have been proposed to identify the score distributions when conducting equating under the NEAT design.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"47 1","pages":"406 - 437"},"PeriodicalIF":2.4,"publicationDate":"2022-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43615425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Statistical Inference for G-indices of Agreement 一致性g指数的统计推断
IF 2.4 3区 心理学
Journal of Educational and Behavioral Statistics Pub Date : 2022-04-29 DOI: 10.3102/10769986221088561
D. Bonett
{"title":"Statistical Inference for G-indices of Agreement","authors":"D. Bonett","doi":"10.3102/10769986221088561","DOIUrl":"https://doi.org/10.3102/10769986221088561","url":null,"abstract":"The limitations of Cohen’s κ are reviewed and an alternative G-index is recommended for assessing nominal-scale agreement. Maximum likelihood estimates, standard errors, and confidence intervals for a two-rater G-index are derived for one-group and two-group designs. A new G-index of agreement for multirater designs is proposed. Statistical inference methods for some important special cases of the multirater design also are derived. G-index meta-analysis methods are proposed and can be used to combine and compare agreement across two or more populations. Closed-form sample-size formulas to achieve desired confidence interval precision are proposed for two-rater and multirater designs. R functions are given for all results.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"47 1","pages":"438 - 458"},"PeriodicalIF":2.4,"publicationDate":"2022-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44008526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Latent Trait Item Response Models for Continuous Responses 连续反应的潜在特质-项目反应模型
IF 2.4 3区 心理学
Journal of Educational and Behavioral Statistics Pub Date : 2022-04-08 DOI: 10.3102/10769986231184147
G. Tutz, Pascal Jordan
{"title":"Latent Trait Item Response Models for Continuous Responses","authors":"G. Tutz, Pascal Jordan","doi":"10.3102/10769986231184147","DOIUrl":"https://doi.org/10.3102/10769986231184147","url":null,"abstract":"A general framework of latent trait item response models for continuous responses is given. In contrast to classical test theory (CTT) models, which traditionally distinguish between true scores and error scores, the responses are clearly linked to latent traits. It is shown that CTT models can be derived as special cases, but the model class is much wider. It provides, in particular, appropriate modeling of responses that are restricted in some way, for example, if responses are positive or are restricted to an interval. Restrictions of this sort are easily incorporated in the modeling framework. Restriction to an interval is typically ignored in common models yielding inappropriate models, for example, when modeling Likert-type data. The model also extends common response time models, which can be treated as special cases. The properties of the model class are derived and the role of the total score is investigated, which leads to a modified total score. Several applications illustrate the use of the model including an example, in which covariates that may modify the response are taken into account.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"1 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2022-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47754564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信