Perception of Paralinguistic Traits in Synthesized Voices

Alice Baird, Stina Marie Hasse Jørgensen, Emilia Parada-Cabaleiro, Simone Hantke, N. Cummins, Björn Schuller
{"title":"Perception of Paralinguistic Traits in Synthesized Voices","authors":"Alice Baird, Stina Marie Hasse Jørgensen, Emilia Parada-Cabaleiro, Simone Hantke, N. Cummins, Björn Schuller","doi":"10.1145/3123514.3123528","DOIUrl":null,"url":null,"abstract":"Along with the rise of artificial intelligence and the internet-of-things, synthesized voices are now common in daily--life, providing us with guidance, assistance, and even companionship. From formant to concatenative synthesis, the synthesized voice continues to be defined by the same traits we prescribe to ourselves. When the recorded voice is synthesized, does our perception of its new machine embodiment change, and can we consider an alternative, more inclusive form? To begin evaluating the impact of aesthetic design, this study presents a first--step perception test to explore the paralinguistic traits of the synthesized voice. Using a corpus of 13 synthesized voices, constructed from acoustic concatenative speech synthesis, we assessed the response of 23 listeners from differing cultural backgrounds. To evaluate if perception shifts from the defined traits, we asked listeners to assigned traits of age, gender, accent origin, and human--likeness. Results present a difference in perception for age and human--likeness across voices, and a general agreement across listeners for both gender and accent origin. Connections found between age, gender and human--likeness call for further exploration into a more participatory and inclusive synthesized vocal identity.","PeriodicalId":282371,"journal":{"name":"Proceedings of the 12th International Audio Mostly Conference on Augmented and Participatory Sound and Music Experiences","volume":"207 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 12th International Audio Mostly Conference on Augmented and Participatory Sound and Music Experiences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3123514.3123528","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

Abstract

Along with the rise of artificial intelligence and the internet-of-things, synthesized voices are now common in daily--life, providing us with guidance, assistance, and even companionship. From formant to concatenative synthesis, the synthesized voice continues to be defined by the same traits we prescribe to ourselves. When the recorded voice is synthesized, does our perception of its new machine embodiment change, and can we consider an alternative, more inclusive form? To begin evaluating the impact of aesthetic design, this study presents a first--step perception test to explore the paralinguistic traits of the synthesized voice. Using a corpus of 13 synthesized voices, constructed from acoustic concatenative speech synthesis, we assessed the response of 23 listeners from differing cultural backgrounds. To evaluate if perception shifts from the defined traits, we asked listeners to assigned traits of age, gender, accent origin, and human--likeness. Results present a difference in perception for age and human--likeness across voices, and a general agreement across listeners for both gender and accent origin. Connections found between age, gender and human--likeness call for further exploration into a more participatory and inclusive synthesized vocal identity.
合成语音的副语言特征感知
随着人工智能和物联网的兴起,合成语音在日常生活中很常见,为我们提供指导、帮助,甚至陪伴。从形成峰到串联合成,合成的声音继续由我们给自己规定的相同特征来定义。当录制的声音被合成时,我们对其新机器体现的感知是否会发生变化,我们是否可以考虑另一种更包容的形式?为了开始评估美学设计的影响,本研究提出了一个第一步感知测试,以探索合成声音的副语言特征。使用由声学连接语音合成构建的13个合成声音语料库,我们评估了来自不同文化背景的23名听众的反应。为了评估感知是否会从定义的特征转变,我们要求听众分配年龄、性别、口音来源和人类相似的特征。结果显示,不同年龄的人对声音的感知存在差异,不同性别和口音来源的人对声音的感知也存在差异。年龄、性别和人类形象之间的联系需要进一步探索更具参与性和包容性的合成声乐身份。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信