有多少评分者就够了:G理论在二语语音感知评估和测量中的应用

Kevin Hirschi, Okim Kang
{"title":"有多少评分者就够了:G理论在二语语音感知评估和测量中的应用","authors":"Kevin Hirschi, Okim Kang","doi":"10.32038/ltrq.2023.37.12","DOIUrl":null,"url":null,"abstract":"This paper extends the use of Generalizability Theory to the measurement of extemporaneous L2 speech through the lens of speech perception. Using six datasets of previous studies, it reports on G studies–a method of breaking down measurement variance–and D studies–a predictive study of the impact on reliability when modifying the number of raters, items, or other facets that assist the field in adopting measurement designs that include comprehensibility, accentedness, and intelligibility. When data from a single audio sample per learner were subjected to D-studies, we find that both semantic differential and rubric scales for comprehensibility were reliable at the .90 level with about 15 trained raters or 50 untrained crowdsourced raters. In order to offer generalizable and dependable evaluations, empirically informed recommendations are given, including considerations for the number of speech samples rated, or the granularity of the scales for various assessment and research purposes.","PeriodicalId":350461,"journal":{"name":"Language Teaching Research Quarterly","volume":"127 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"How Many Raters Can Be Enough: G Theory Applied to Assessment and Measurement of L2 Speech Perception\",\"authors\":\"Kevin Hirschi, Okim Kang\",\"doi\":\"10.32038/ltrq.2023.37.12\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper extends the use of Generalizability Theory to the measurement of extemporaneous L2 speech through the lens of speech perception. Using six datasets of previous studies, it reports on G studies–a method of breaking down measurement variance–and D studies–a predictive study of the impact on reliability when modifying the number of raters, items, or other facets that assist the field in adopting measurement designs that include comprehensibility, accentedness, and intelligibility. When data from a single audio sample per learner were subjected to D-studies, we find that both semantic differential and rubric scales for comprehensibility were reliable at the .90 level with about 15 trained raters or 50 untrained crowdsourced raters. In order to offer generalizable and dependable evaluations, empirically informed recommendations are given, including considerations for the number of speech samples rated, or the granularity of the scales for various assessment and research purposes.\",\"PeriodicalId\":350461,\"journal\":{\"name\":\"Language Teaching Research Quarterly\",\"volume\":\"127 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Language Teaching Research Quarterly\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.32038/ltrq.2023.37.12\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Language Teaching Research Quarterly","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32038/ltrq.2023.37.12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文从言语感知的角度,将概括性理论扩展到二语即兴言语的测量中。使用先前研究的六个数据集,它报告了G研究-一种分解测量方差的方法-和D研究-在修改评分者,项目或其他方面的数量时对可靠性影响的预测性研究,这些方面有助于该领域采用包括可理解性,重音性和可理解性在内的测量设计。当每个学习者的单个音频样本的数据进行d研究时,我们发现语义差异和可理解性的规则量表在0.90水平上是可靠的,大约有15个训练有素的评分者或50个未经训练的众包评分者。为了提供可推广和可靠的评估,给出了基于经验的建议,包括考虑被评级的语音样本数量,或用于各种评估和研究目的的尺度粒度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
How Many Raters Can Be Enough: G Theory Applied to Assessment and Measurement of L2 Speech Perception
This paper extends the use of Generalizability Theory to the measurement of extemporaneous L2 speech through the lens of speech perception. Using six datasets of previous studies, it reports on G studies–a method of breaking down measurement variance–and D studies–a predictive study of the impact on reliability when modifying the number of raters, items, or other facets that assist the field in adopting measurement designs that include comprehensibility, accentedness, and intelligibility. When data from a single audio sample per learner were subjected to D-studies, we find that both semantic differential and rubric scales for comprehensibility were reliable at the .90 level with about 15 trained raters or 50 untrained crowdsourced raters. In order to offer generalizable and dependable evaluations, empirically informed recommendations are given, including considerations for the number of speech samples rated, or the granularity of the scales for various assessment and research purposes.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信