Visualization of Speech Prosody and Emotion in Captions: Accessibility for Deaf and Hard-of-Hearing Users

Caluã de Lacerda Pataca, Matthew Watkins, Roshan Peiris, Sooyeon Lee, Matt Huenerfauth
{"title":"Visualization of Speech Prosody and Emotion in Captions: Accessibility for Deaf and Hard-of-Hearing Users","authors":"Caluã de Lacerda Pataca, Matthew Watkins, Roshan Peiris, Sooyeon Lee, Matt Huenerfauth","doi":"10.1145/3544548.3581511","DOIUrl":null,"url":null,"abstract":"Speech is expressive in ways that caption text does not capture, with emotion or emphasis information not conveyed. We interviewed eight Deaf and Hard-of-Hearing (dhh) individuals to understand if and how captions’ inexpressiveness impacts them in online meetings with hearing peers. Automatically captioned speech, we found, lacks affective depth, lending it a hard-to-parse ambiguity and general dullness. Interviewees regularly feel excluded, which some understand is an inherent quality of these types of meetings rather than a consequence of current caption text design. Next, we developed three novel captioning models that depicted, beyond words, features from prosody, emotions, and a mix of both. In an empirical study, 16 dhh participants compared these models with conventional captions. The emotion-based model outperformed traditional captions in depicting emotions and emphasis, with only a moderate loss in legibility, suggesting its potential as a more inclusive design for captions.","PeriodicalId":314098,"journal":{"name":"Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems","volume":"109 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3544548.3581511","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Speech is expressive in ways that caption text does not capture, with emotion or emphasis information not conveyed. We interviewed eight Deaf and Hard-of-Hearing (dhh) individuals to understand if and how captions’ inexpressiveness impacts them in online meetings with hearing peers. Automatically captioned speech, we found, lacks affective depth, lending it a hard-to-parse ambiguity and general dullness. Interviewees regularly feel excluded, which some understand is an inherent quality of these types of meetings rather than a consequence of current caption text design. Next, we developed three novel captioning models that depicted, beyond words, features from prosody, emotions, and a mix of both. In an empirical study, 16 dhh participants compared these models with conventional captions. The emotion-based model outperformed traditional captions in depicting emotions and emphasis, with only a moderate loss in legibility, suggesting its potential as a more inclusive design for captions.
语言韵律和情感在字幕中的可视化:聋人和听力障碍者的可及性
言语的表达方式是文字标题无法捕捉到的,它带有情感或强调信息,无法传达。我们采访了8位聋人和听力障碍者(dhh),以了解字幕的缺乏表达是否以及如何影响他们与听力障碍者的在线会议。我们发现,自动配字幕的演讲缺乏情感深度,导致难以解析的模糊性和普遍的沉闷。受访者经常感到被排除在外,有些人认为这是这些类型会议的固有品质,而不是当前标题文本设计的结果。接下来,我们开发了三种新的字幕模型,除了文字之外,还描述了韵律、情感和两者的混合特征。在一项实证研究中,16名dhh参与者将这些模型与传统字幕进行了比较。基于情感的模型在描述情感和强调方面优于传统的字幕,仅在易读性上有适度的损失,这表明它有潜力成为更具包容性的字幕设计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信