DIF investigations across groups of gender and academic background in a large-scale high-stakes language test 

IF 0.1 Q4 LINGUISTICS
Xia-li Song, Liying Cheng, D. Klinger
{"title":"DIF investigations across groups of gender and academic background in a large-scale high-stakes language test ","authors":"Xia-li Song, Liying Cheng, D. Klinger","doi":"10.58379/rshg8366","DOIUrl":null,"url":null,"abstract":"High-stakes pre-entry language testing is the predominate tool used to measure test takers’ proficiency for admission purposes in higher education in China. Given the important role of these tests, there are heated discussions about how to ensure test fairness for different groups of test takers. This study examined the fairness of the Graduate School Entrance English Examination (GSEEE) that is used to decide whether over one million test takers can enter master’s programs in China. Using SIBTEST and content analysis, the study investigated differential item functioning (DIF) and the presence of potential bias on the GSEEE with aspects to groups of gender and academic background. Results found that a large percentage of the GSEEE items did not provide reliable results to distinguish good and poor performers. A number of DIF and DBF functioned differentially and three test reviewers identified a myriad of factors such as motivation and learning styles that potentially contributed to group performance differences. However, consistent evidence was not found to suggest these flagged items/texts exhibited bias. While systematic bias may not have been detected, the results revealed poor test reliability and the study highlighted an urgent need to improve test quality and clarify the purpose of the test. DIF issues may be revisited once test quality has been improved.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"90 1","pages":""},"PeriodicalIF":0.1000,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in Language Assessment","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.58379/rshg8366","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"LINGUISTICS","Score":null,"Total":0}
引用次数: 12

Abstract

High-stakes pre-entry language testing is the predominate tool used to measure test takers’ proficiency for admission purposes in higher education in China. Given the important role of these tests, there are heated discussions about how to ensure test fairness for different groups of test takers. This study examined the fairness of the Graduate School Entrance English Examination (GSEEE) that is used to decide whether over one million test takers can enter master’s programs in China. Using SIBTEST and content analysis, the study investigated differential item functioning (DIF) and the presence of potential bias on the GSEEE with aspects to groups of gender and academic background. Results found that a large percentage of the GSEEE items did not provide reliable results to distinguish good and poor performers. A number of DIF and DBF functioned differentially and three test reviewers identified a myriad of factors such as motivation and learning styles that potentially contributed to group performance differences. However, consistent evidence was not found to suggest these flagged items/texts exhibited bias. While systematic bias may not have been detected, the results revealed poor test reliability and the study highlighted an urgent need to improve test quality and clarify the purpose of the test. DIF issues may be revisited once test quality has been improved.
大规模高风险语言测试中跨性别和学术背景的DIF调查
在中国高等教育中,高风险的入学前语言测试是衡量考生入学能力的主要工具。鉴于这些考试的重要作用,人们就如何确保不同考生群体的考试公平展开了激烈的讨论。这项研究考察了研究生入学英语考试(GSEEE)的公平性,GSEEE被用来决定中国100多万考生是否能进入硕士项目。本研究采用SIBTEST和内容分析方法,从性别和学术背景两个方面考察了GSEEE的差异项目功能(DIF)和潜在偏差的存在。结果发现,很大比例的GSEEE项目没有提供可靠的结果来区分优等生和劣等生。许多DIF和DBF的功能是不同的,三个测试评审者确定了无数的因素,如动机和学习风格,这些因素可能会导致小组表现的差异。然而,没有一致的证据表明这些标记的项目/文本显示出偏见。虽然系统偏倚可能没有被发现,但结果显示了较差的测试可靠性,该研究强调了提高测试质量和澄清测试目的的迫切需要。一旦测试质量得到改善,DIF问题可能会被重新审视。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信