用多面拉赫测量法分析英语学生作文的严轻性和偏倚性

Q4 Arts and Humanities

Scope Pub Date : 2023-10-16 DOI:10.30998/scope.v8i1.19432

Yenni Arif Rahman, Fitri Apriyanti, Rahmi Aulia Nurdini

{"title":"用多面拉赫测量法分析英语学生作文的严轻性和偏倚性","authors":"Yenni Arif Rahman, Fitri Apriyanti, Rahmi Aulia Nurdini","doi":"10.30998/scope.v8i1.19432","DOIUrl":null,"url":null,"abstract":"<p>The study aims to investigate the extent to which raters exhibit tendencies towards being overly severe, lenient, or even bias when evaluating students' writing compositions in Indonesia. Data were collected from 15 student essays and four raters with master's degrees in English education. The Many-facet Rasch measurement (MFRM), automatized by Minifac software, a program created for the Many-facet Rasch measurement, was used for data analysis. This was done by meticulously dissecting the assessment process into its distinct components—raters, essay items, and the specific traits or criteria being evaluated in the writing rubric. Each rater's level of severity or leniency, essentially how strict or lenient they are in assigning scores, is scrutinized. Likewise, the potential biases that raters might introduce into the grading process are carefully examined. The findings revealed that, while the raters used the rubric consistently when scoring all test takers, they varied in how lenient or severe they were. Scores of 70 were given more frequently than the other score. Based on the findings, composition raters may differ in how they rate students which potentially leading to student dissatisfaction, particularly when raters adopt severe scoring. The bias in scoring has highlighted that certain raters consistently tend to inaccurately score items, deviating from the established criteria (traits). Furthermore, the study also found that having more than four items/criteria (content, diction, structure, and mechanic) is essential to achieve a more diverse distribution of item difficulty and effectively measure students' writing abilities. These results are valuable for writing departments to improve the oversight of inter-rater reliability and rating consistency. To address this issue, implementing rater training is suggested as the most feasible method to ensure more dependable and consistent evaluations.</p>","PeriodicalId":35166,"journal":{"name":"Scope","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Rater Severity/Leniency and Bias in EFL Students' Composition Using Many-Facet Rasch Measurement (MFRM)\",\"authors\":\"Yenni Arif Rahman, Fitri Apriyanti, Rahmi Aulia Nurdini\",\"doi\":\"10.30998/scope.v8i1.19432\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The study aims to investigate the extent to which raters exhibit tendencies towards being overly severe, lenient, or even bias when evaluating students' writing compositions in Indonesia. Data were collected from 15 student essays and four raters with master's degrees in English education. The Many-facet Rasch measurement (MFRM), automatized by Minifac software, a program created for the Many-facet Rasch measurement, was used for data analysis. This was done by meticulously dissecting the assessment process into its distinct components—raters, essay items, and the specific traits or criteria being evaluated in the writing rubric. Each rater's level of severity or leniency, essentially how strict or lenient they are in assigning scores, is scrutinized. Likewise, the potential biases that raters might introduce into the grading process are carefully examined. The findings revealed that, while the raters used the rubric consistently when scoring all test takers, they varied in how lenient or severe they were. Scores of 70 were given more frequently than the other score. Based on the findings, composition raters may differ in how they rate students which potentially leading to student dissatisfaction, particularly when raters adopt severe scoring. The bias in scoring has highlighted that certain raters consistently tend to inaccurately score items, deviating from the established criteria (traits). Furthermore, the study also found that having more than four items/criteria (content, diction, structure, and mechanic) is essential to achieve a more diverse distribution of item difficulty and effectively measure students' writing abilities. These results are valuable for writing departments to improve the oversight of inter-rater reliability and rating consistency. To address this issue, implementing rater training is suggested as the most feasible method to ensure more dependable and consistent evaluations.</p>\",\"PeriodicalId\":35166,\"journal\":{\"name\":\"Scope\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scope\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.30998/scope.v8i1.19432\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Arts and Humanities\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scope","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30998/scope.v8i1.19432","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Arts and Humanities","Score":null,"Total":0}

引用次数: 0

摘要

这项研究的目的是调查在印度尼西亚，评分者在评估学生的写作作文时，表现出过于严厉、宽容甚至有偏见的倾向的程度。数据来自15名学生的论文和4名拥有英语教育硕士学位的评分者。多面Rasch测量(MFRM)由Minifac软件自动化，该软件是为多面Rasch测量创建的程序，用于数据分析。这是通过一丝不苟地将评估过程分解为其不同的组成部分来完成的-评分者，论文项目，以及在写作标题中评估的特定特征或标准。每个评分者的严厉程度或宽容程度，本质上是他们在评分时的严格程度或宽容程度，都会受到仔细审查。同样，评分员在评分过程中可能引入的潜在偏差也会被仔细检查。研究结果显示，虽然评价者在给所有考生打分时都使用了相同的标准，但他们在宽严程度上有所不同。70分比其他分数给出的频率更高。根据调查结果，作文评分者对学生的评分方式可能会有所不同，这可能会导致学生不满，特别是当评分者采用严厉的评分时。评分的偏差突出表明，某些评分者总是倾向于不准确地评分项目，偏离既定的标准(特征)。此外，该研究还发现，拥有四个以上的项目/标准(内容、措辞、结构和机制)对于实现项目难度的更多样化分布和有效衡量学生的写作能力至关重要。这些结果对写作部门提高对评语可信度和评语一致性的监督有一定的参考价值。为解决这一问题，建议将实施评价员培训作为最可行的方法，以确保更加可靠和一致的评价。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Rater Severity/Leniency and Bias in EFL Students' Composition Using Many-Facet Rasch Measurement (MFRM)

The study aims to investigate the extent to which raters exhibit tendencies towards being overly severe, lenient, or even bias when evaluating students' writing compositions in Indonesia. Data were collected from 15 student essays and four raters with master's degrees in English education. The Many-facet Rasch measurement (MFRM), automatized by Minifac software, a program created for the Many-facet Rasch measurement, was used for data analysis. This was done by meticulously dissecting the assessment process into its distinct components—raters, essay items, and the specific traits or criteria being evaluated in the writing rubric. Each rater's level of severity or leniency, essentially how strict or lenient they are in assigning scores, is scrutinized. Likewise, the potential biases that raters might introduce into the grading process are carefully examined. The findings revealed that, while the raters used the rubric consistently when scoring all test takers, they varied in how lenient or severe they were. Scores of 70 were given more frequently than the other score. Based on the findings, composition raters may differ in how they rate students which potentially leading to student dissatisfaction, particularly when raters adopt severe scoring. The bias in scoring has highlighted that certain raters consistently tend to inaccurately score items, deviating from the established criteria (traits). Furthermore, the study also found that having more than four items/criteria (content, diction, structure, and mechanic) is essential to achieve a more diverse distribution of item difficulty and effectively measure students' writing abilities. These results are valuable for writing departments to improve the oversight of inter-rater reliability and rating consistency. To address this issue, implementing rater training is suggested as the most feasible method to ensure more dependable and consistent evaluations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Scope Arts and Humanities-Visual Arts and Performing Arts

自引率

0.00%

发文量