{"title":"在Amazon Mechanical Turk中开发和验证众包L2语音评级方法","authors":"C. Nagle","doi":"10.1075/JSLP.18016.NAG","DOIUrl":null,"url":null,"abstract":"\n Researchers have increasingly turned to Amazon Mechanical Turk (AMT) to crowdsource speech data, predominantly in\n English. Although AMT and similar platforms are well positioned to enhance the state of the art in L2 research, it is unclear if\n crowdsourced L2 speech ratings are reliable, particularly in languages other than English. The present study describes the\n development and deployment of an AMT task to crowdsource comprehensibility, fluency, and accentedness ratings for L2 Spanish\n speech samples. Fifty-four AMT workers who were native Spanish speakers from 11 countries participated in the ratings. Intraclass\n correlation coefficients were used to estimate group-level interrater reliability, and Rasch analyses were undertaken to examine\n individual differences in rater severity and fit. Excellent reliability was observed for the comprehensibility and fluency\n ratings, but indices were slightly lower for accentedness, leading to recommendations to improve the task for future data\n collection.","PeriodicalId":91766,"journal":{"name":"Journal of second language pronunciation","volume":"1 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2019-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Developing and validating a methodology for crowdsourcing L2 speech ratings in Amazon Mechanical Turk\",\"authors\":\"C. Nagle\",\"doi\":\"10.1075/JSLP.18016.NAG\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Researchers have increasingly turned to Amazon Mechanical Turk (AMT) to crowdsource speech data, predominantly in\\n English. Although AMT and similar platforms are well positioned to enhance the state of the art in L2 research, it is unclear if\\n crowdsourced L2 speech ratings are reliable, particularly in languages other than English. The present study describes the\\n development and deployment of an AMT task to crowdsource comprehensibility, fluency, and accentedness ratings for L2 Spanish\\n speech samples. Fifty-four AMT workers who were native Spanish speakers from 11 countries participated in the ratings. Intraclass\\n correlation coefficients were used to estimate group-level interrater reliability, and Rasch analyses were undertaken to examine\\n individual differences in rater severity and fit. Excellent reliability was observed for the comprehensibility and fluency\\n ratings, but indices were slightly lower for accentedness, leading to recommendations to improve the task for future data\\n collection.\",\"PeriodicalId\":91766,\"journal\":{\"name\":\"Journal of second language pronunciation\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2019-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of second language pronunciation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1075/JSLP.18016.NAG\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of second language pronunciation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1075/JSLP.18016.NAG","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Developing and validating a methodology for crowdsourcing L2 speech ratings in Amazon Mechanical Turk
Researchers have increasingly turned to Amazon Mechanical Turk (AMT) to crowdsource speech data, predominantly in
English. Although AMT and similar platforms are well positioned to enhance the state of the art in L2 research, it is unclear if
crowdsourced L2 speech ratings are reliable, particularly in languages other than English. The present study describes the
development and deployment of an AMT task to crowdsource comprehensibility, fluency, and accentedness ratings for L2 Spanish
speech samples. Fifty-four AMT workers who were native Spanish speakers from 11 countries participated in the ratings. Intraclass
correlation coefficients were used to estimate group-level interrater reliability, and Rasch analyses were undertaken to examine
individual differences in rater severity and fit. Excellent reliability was observed for the comprehensibility and fluency
ratings, but indices were slightly lower for accentedness, leading to recommendations to improve the task for future data
collection.