Elham Akhlaghi, A. Bączkowska, Harald Berthelsen, Branislav Bédi, Cathy Chua, C. Cucchiarini, Hanieh Habibi, Ivana Horváthová, Pernille Hvalsøe, Roy Lotz, Christèle Maizonniaux, Neasa Ní Chiaráin, Manny Rayner, Nikos Tsourakis, Chunlin Yao
{"title":"Assessing the quality of TTS audio in the LARA learning-by-reading\n platform","authors":"Elham Akhlaghi, A. Bączkowska, Harald Berthelsen, Branislav Bédi, Cathy Chua, C. Cucchiarini, Hanieh Habibi, Ivana Horváthová, Pernille Hvalsøe, Roy Lotz, Christèle Maizonniaux, Neasa Ní Chiaráin, Manny Rayner, Nikos Tsourakis, Chunlin Yao","doi":"10.14705/rpnet.2021.54.1299","DOIUrl":null,"url":null,"abstract":"A popular idea in Computer Assisted Language Learning (CALL) is to use\n multimodal annotated texts, with annotations typically including embedded\n audio and translations, to support L2 learning through reading. An important\n question is how to create the audio, which can be done either through human\n recording or by a Text-To-Speech (TTS) synthesis engine. We may reasonably\n expect TTS to be quicker and easier, but humans to be of higher quality.\n Here, we report a study using the open-source LARA platform and ten\n languages. Samples of LARA audio totaling about three and a half minutes\n were provided for each language in both human and TTS form; subjects used a\n web form to compare different versions of the same item and rate the voices\n as a whole. Although human voice was more often preferred, TTS achieved\n higher ratings in some languages and was close in others.","PeriodicalId":350173,"journal":{"name":"CALL and professionalisation: short papers from EUROCALL 2021","volume":"376 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CALL and professionalisation: short papers from EUROCALL 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14705/rpnet.2021.54.1299","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
A popular idea in Computer Assisted Language Learning (CALL) is to use
multimodal annotated texts, with annotations typically including embedded
audio and translations, to support L2 learning through reading. An important
question is how to create the audio, which can be done either through human
recording or by a Text-To-Speech (TTS) synthesis engine. We may reasonably
expect TTS to be quicker and easier, but humans to be of higher quality.
Here, we report a study using the open-source LARA platform and ten
languages. Samples of LARA audio totaling about three and a half minutes
were provided for each language in both human and TTS form; subjects used a
web form to compare different versions of the same item and rate the voices
as a whole. Although human voice was more often preferred, TTS achieved
higher ratings in some languages and was close in others.