有多少评分者就够了:G理论在二语语音感知评估和测量中的应用

Language Teaching Research Quarterly Pub Date : 2023-11-01 DOI:10.32038/ltrq.2023.37.12

Kevin Hirschi, Okim Kang

{"title":"有多少评分者就够了:G理论在二语语音感知评估和测量中的应用","authors":"Kevin Hirschi, Okim Kang","doi":"10.32038/ltrq.2023.37.12","DOIUrl":null,"url":null,"abstract":"This paper extends the use of Generalizability Theory to the measurement of extemporaneous L2 speech through the lens of speech perception. Using six datasets of previous studies, it reports on G studies–a method of breaking down measurement variance–and D studies–a predictive study of the impact on reliability when modifying the number of raters, items, or other facets that assist the field in adopting measurement designs that include comprehensibility, accentedness, and intelligibility. When data from a single audio sample per learner were subjected to D-studies, we find that both semantic differential and rubric scales for comprehensibility were reliable at the .90 level with about 15 trained raters or 50 untrained crowdsourced raters. In order to offer generalizable and dependable evaluations, empirically informed recommendations are given, including considerations for the number of speech samples rated, or the granularity of the scales for various assessment and research purposes.","PeriodicalId":350461,"journal":{"name":"Language Teaching Research Quarterly","volume":"127 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"How Many Raters Can Be Enough: G Theory Applied to Assessment and Measurement of L2 Speech Perception\",\"authors\":\"Kevin Hirschi, Okim Kang\",\"doi\":\"10.32038/ltrq.2023.37.12\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper extends the use of Generalizability Theory to the measurement of extemporaneous L2 speech through the lens of speech perception. Using six datasets of previous studies, it reports on G studies–a method of breaking down measurement variance–and D studies–a predictive study of the impact on reliability when modifying the number of raters, items, or other facets that assist the field in adopting measurement designs that include comprehensibility, accentedness, and intelligibility. When data from a single audio sample per learner were subjected to D-studies, we find that both semantic differential and rubric scales for comprehensibility were reliable at the .90 level with about 15 trained raters or 50 untrained crowdsourced raters. In order to offer generalizable and dependable evaluations, empirically informed recommendations are given, including considerations for the number of speech samples rated, or the granularity of the scales for various assessment and research purposes.\",\"PeriodicalId\":350461,\"journal\":{\"name\":\"Language Teaching Research Quarterly\",\"volume\":\"127 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Language Teaching Research Quarterly\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.32038/ltrq.2023.37.12\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Language Teaching Research Quarterly","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32038/ltrq.2023.37.12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文从言语感知的角度，将概括性理论扩展到二语即兴言语的测量中。使用先前研究的六个数据集，它报告了G研究-一种分解测量方差的方法-和D研究-在修改评分者，项目或其他方面的数量时对可靠性影响的预测性研究，这些方面有助于该领域采用包括可理解性，重音性和可理解性在内的测量设计。当每个学习者的单个音频样本的数据进行d研究时，我们发现语义差异和可理解性的规则量表在0.90水平上是可靠的，大约有15个训练有素的评分者或50个未经训练的众包评分者。为了提供可推广和可靠的评估，给出了基于经验的建议，包括考虑被评级的语音样本数量，或用于各种评估和研究目的的尺度粒度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

How Many Raters Can Be Enough: G Theory Applied to Assessment and Measurement of L2 Speech Perception

This paper extends the use of Generalizability Theory to the measurement of extemporaneous L2 speech through the lens of speech perception. Using six datasets of previous studies, it reports on G studies–a method of breaking down measurement variance–and D studies–a predictive study of the impact on reliability when modifying the number of raters, items, or other facets that assist the field in adopting measurement designs that include comprehensibility, accentedness, and intelligibility. When data from a single audio sample per learner were subjected to D-studies, we find that both semantic differential and rubric scales for comprehensibility were reliable at the .90 level with about 15 trained raters or 50 untrained crowdsourced raters. In order to offer generalizable and dependable evaluations, empirically informed recommendations are given, including considerations for the number of speech samples rated, or the granularity of the scales for various assessment and research purposes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Language Teaching Research Quarterly

自引率

0.00%

发文量