Perception and social evaluation of cloned and recorded voices: Effects of familiarity and self-relevance

Computers in Human Behavior: Artificial Humans Pub Date : 2025-03-25 DOI:10.1016/j.chbah.2025.100143

Victor Rosi, Emma Soopramanien, Carolyn McGettigan

{"title":"Perception and social evaluation of cloned and recorded voices: Effects of familiarity and self-relevance","authors":"Victor Rosi, Emma Soopramanien, Carolyn McGettigan","doi":"10.1016/j.chbah.2025.100143","DOIUrl":null,"url":null,"abstract":"<div><div>Modern speech technologies enable the artificial replication, or cloning, of the human voice. In the present study, we investigated whether listeners' perception and social evaluation of state-of-the-art voice clones depend on whether the clone being heard is a replica of the self, a friend, or a total stranger. We recorded and cloned the voices of familiar pairs of adult participants. Forty-seven of these experimental participants (and 47 unfamiliar controls) rated the Trustworthiness, Attractiveness, Competence, and Dominance of cloned and recorded samples of their own voice and their friend's voice. We observed that while familiar listeners found clones to sound less (or similarly) trustworthy, attractive, and competent than recordings, unfamiliar listeners showed an opposing profile in which clones tended to be rated higher than recordings. Within this, familiar listeners tended to prefer their friend's voice to their own, although perceived similarity of both self- and friend-voice clones to the original speaker identity predicted higher ratings on all trait scales. Overall, we find that familiar listeners' impressions are sensitive to the perceived accuracy and authenticity of cloning for voices they know well, while unfamiliar listeners tend to prefer the synthetic versions of those same voice identities. The latter observation may relate to the tendency of generative voice synthesis models to homogenise speaking accents and styles, such that they more closely approximate (preferred) norms.</div></div>","PeriodicalId":100324,"journal":{"name":"Computers in Human Behavior: Artificial Humans","volume":"4 ","pages":"Article 100143"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in Human Behavior: Artificial Humans","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949882125000271","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Modern speech technologies enable the artificial replication, or cloning, of the human voice. In the present study, we investigated whether listeners' perception and social evaluation of state-of-the-art voice clones depend on whether the clone being heard is a replica of the self, a friend, or a total stranger. We recorded and cloned the voices of familiar pairs of adult participants. Forty-seven of these experimental participants (and 47 unfamiliar controls) rated the Trustworthiness, Attractiveness, Competence, and Dominance of cloned and recorded samples of their own voice and their friend's voice. We observed that while familiar listeners found clones to sound less (or similarly) trustworthy, attractive, and competent than recordings, unfamiliar listeners showed an opposing profile in which clones tended to be rated higher than recordings. Within this, familiar listeners tended to prefer their friend's voice to their own, although perceived similarity of both self- and friend-voice clones to the original speaker identity predicted higher ratings on all trait scales. Overall, we find that familiar listeners' impressions are sensitive to the perceived accuracy and authenticity of cloning for voices they know well, while unfamiliar listeners tend to prefer the synthetic versions of those same voice identities. The latter observation may relate to the tendency of generative voice synthesis models to homogenise speaking accents and styles, such that they more closely approximate (preferred) norms.

查看原文本刊更多论文

克隆和录音声音的感知和社会评价：熟悉度和自我关联的影响

现代语音技术使人工复制或克隆人类的声音成为可能。在本研究中，我们调查了听众对最先进的克隆声音的感知和社会评价是否取决于所听到的克隆声音是自己、朋友还是陌生人的复制品。我们记录并克隆了熟悉的成年参与者的声音。其中47名实验参与者（以及47名不熟悉的对照组）对自己和朋友的声音的克隆和录音样本的可信度、吸引力、能力和支配性进行了评分。我们观察到，虽然熟悉的听众觉得克隆听起来不如录音可信、有吸引力、有能力，但不熟悉的听众则表现出相反的情况，克隆的评分往往高于录音。在这个实验中，熟悉的听者倾向于喜欢他们朋友的声音，而不是自己的声音，尽管自我和朋友的声音克隆与原始说话者身份的相似性预测在所有特征量表上都有更高的评分。总的来说，我们发现熟悉的听众对克隆他们熟悉的声音的准确性和真实性的印象很敏感，而不熟悉的听众往往更喜欢相同声音身份的合成版本。后一种观察可能与生成语音合成模型的趋势有关，即使说话口音和风格同质化，这样它们就更接近（首选）规范。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers in Human Behavior: Artificial Humans

自引率

0.00%

发文量