Assessing the quality of TTS audio in the LARA learning-by-reading platform

Elham Akhlaghi, A. Bączkowska, Harald Berthelsen, Branislav Bédi, Cathy Chua, C. Cucchiarini, Hanieh Habibi, Ivana Horváthová, Pernille Hvalsøe, Roy Lotz, Christèle Maizonniaux, Neasa Ní Chiaráin, Manny Rayner, Nikos Tsourakis, Chunlin Yao
{"title":"Assessing the quality of TTS audio in the LARA learning-by-reading\n platform","authors":"Elham Akhlaghi, A. Bączkowska, Harald Berthelsen, Branislav Bédi, Cathy Chua, C. Cucchiarini, Hanieh Habibi, Ivana Horváthová, Pernille Hvalsøe, Roy Lotz, Christèle Maizonniaux, Neasa Ní Chiaráin, Manny Rayner, Nikos Tsourakis, Chunlin Yao","doi":"10.14705/rpnet.2021.54.1299","DOIUrl":null,"url":null,"abstract":"A popular idea in Computer Assisted Language Learning (CALL) is to use\n multimodal annotated texts, with annotations typically including embedded\n audio and translations, to support L2 learning through reading. An important\n question is how to create the audio, which can be done either through human\n recording or by a Text-To-Speech (TTS) synthesis engine. We may reasonably\n expect TTS to be quicker and easier, but humans to be of higher quality.\n Here, we report a study using the open-source LARA platform and ten\n languages. Samples of LARA audio totaling about three and a half minutes\n were provided for each language in both human and TTS form; subjects used a\n web form to compare different versions of the same item and rate the voices\n as a whole. Although human voice was more often preferred, TTS achieved\n higher ratings in some languages and was close in others.","PeriodicalId":350173,"journal":{"name":"CALL and professionalisation: short papers from EUROCALL 2021","volume":"376 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CALL and professionalisation: short papers from EUROCALL 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14705/rpnet.2021.54.1299","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

A popular idea in Computer Assisted Language Learning (CALL) is to use multimodal annotated texts, with annotations typically including embedded audio and translations, to support L2 learning through reading. An important question is how to create the audio, which can be done either through human recording or by a Text-To-Speech (TTS) synthesis engine. We may reasonably expect TTS to be quicker and easier, but humans to be of higher quality. Here, we report a study using the open-source LARA platform and ten languages. Samples of LARA audio totaling about three and a half minutes were provided for each language in both human and TTS form; subjects used a web form to compare different versions of the same item and rate the voices as a whole. Although human voice was more often preferred, TTS achieved higher ratings in some languages and was close in others.
在LARA阅读学习平台中评估TTS音频的质量
在计算机辅助语言学习(CALL)中,一个流行的想法是使用多模态注释文本,其注释通常包括嵌入式音频和翻译,以支持通过阅读学习第二语言。一个重要的问题是如何创建音频,这可以通过人工录制或通过文本到语音(TTS)合成引擎来完成。我们可以合理地期望TTS更快、更容易,但人类的质量更高。在这里,我们报告了一项使用开源LARA平台和十种语言的研究。为每种语言提供了人类和TTS形式的总计约三分半钟的LARA音频样本;受试者使用网络表格来比较同一项目的不同版本,并将声音作为一个整体进行评分。虽然人们更喜欢人声,但TTS在一些语言中获得了更高的评分,而在其他语言中则接近。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信