Using Rater Cognition to Improve Generalizability of an Assessment of Scientific Argumentation

Q2 Social Sciences

Practical Assessment, Research and Evaluation Pub Date : 2019-01-01 DOI:10.7275/EY9D-P954

Katrina Borowiec, Courtney Castle

{"title":"Using Rater Cognition to Improve Generalizability of an Assessment of Scientific Argumentation","authors":"Katrina Borowiec, Courtney Castle","doi":"10.7275/EY9D-P954","DOIUrl":null,"url":null,"abstract":"Rater cognition or “think-aloud” studies have historically been used to enhance rater accuracy and consistency in writing and language assessments. As assessments are developed for new, complex constructs from the Next Generation Science Standards (NGSS) , the present study illustrates the utility of extending “think-aloud” studies to science assessment. The study focuses on the development of rubrics for scientific argumentation, one of the NGSS Science and Engineering practices. The initial rubrics were modified based on cognitive interviews with five raters. Next, a group of four new raters scored responses using the original and revised rubrics. A psychometric analysis was conducted to measure change in interrater reliability, accuracy, and generalizability (using a generalizability study or “g-study”) for the original and revised rubrics. Interrater reliability, accuracy, and generalizability increased with the rubric modifications. Furthermore, follow-up interviews with the second group of raters indicated that most raters preferred the revised rubric. These findings illustrate that cognitive interviews with raters can be used to enhance rubric usability and generalizability when assessing scientific argumentation, thereby improving assessment validity.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":"7 1","pages":"8"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Practical Assessment, Research and Evaluation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7275/EY9D-P954","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Social Sciences","Score":null,"Total":0}

引用次数: 3

Abstract

Rater cognition or “think-aloud” studies have historically been used to enhance rater accuracy and consistency in writing and language assessments. As assessments are developed for new, complex constructs from the Next Generation Science Standards (NGSS) , the present study illustrates the utility of extending “think-aloud” studies to science assessment. The study focuses on the development of rubrics for scientific argumentation, one of the NGSS Science and Engineering practices. The initial rubrics were modified based on cognitive interviews with five raters. Next, a group of four new raters scored responses using the original and revised rubrics. A psychometric analysis was conducted to measure change in interrater reliability, accuracy, and generalizability (using a generalizability study or “g-study”) for the original and revised rubrics. Interrater reliability, accuracy, and generalizability increased with the rubric modifications. Furthermore, follow-up interviews with the second group of raters indicated that most raters preferred the revised rubric. These findings illustrate that cognitive interviews with raters can be used to enhance rubric usability and generalizability when assessing scientific argumentation, thereby improving assessment validity.

查看原文本刊更多论文

利用评价认知提高科学论证评价的概括性

评分认知或“有声思考”研究历来被用于提高写作和语言评估的准确性和一致性。由于评估是针对下一代科学标准(NGSS)中新的复杂结构开发的，本研究说明了将“有声思考”研究扩展到科学评估的效用。该研究的重点是科学论证规则的发展，这是NGSS科学与工程实践之一。最初的标准是根据与五位评分者的认知访谈进行修改的。接下来，一组四名新的评分员使用原始和修订后的标准对回答进行评分。对原标准和修订后的标准进行了心理测量分析，以测量量表间信度、准确性和概括性的变化(使用概括性研究或“g研究”)。随着分类的修改，分类间的可靠性、准确性和通用性都有所提高。此外，对第二组评分人的后续访谈表明，大多数评分人更喜欢修订后的评分标准。这些研究结果表明，在评估科学论证时，对评分者进行认知访谈可以增强标题的可用性和概括性，从而提高评估的效度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Practical Assessment, Research and Evaluation Social Sciences-Education

CiteScore

2.60

自引率

0.00%

发文量