Visualization of Speech Prosody and Emotion in Captions: Accessibility for Deaf and Hard-of-Hearing Users

Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems Pub Date : 2023-04-19 DOI:10.1145/3544548.3581511

Caluã de Lacerda Pataca, Matthew Watkins, Roshan Peiris, Sooyeon Lee, Matt Huenerfauth

{"title":"Visualization of Speech Prosody and Emotion in Captions: Accessibility for Deaf and Hard-of-Hearing Users","authors":"Caluã de Lacerda Pataca, Matthew Watkins, Roshan Peiris, Sooyeon Lee, Matt Huenerfauth","doi":"10.1145/3544548.3581511","DOIUrl":null,"url":null,"abstract":"Speech is expressive in ways that caption text does not capture, with emotion or emphasis information not conveyed. We interviewed eight Deaf and Hard-of-Hearing (dhh) individuals to understand if and how captions’ inexpressiveness impacts them in online meetings with hearing peers. Automatically captioned speech, we found, lacks affective depth, lending it a hard-to-parse ambiguity and general dullness. Interviewees regularly feel excluded, which some understand is an inherent quality of these types of meetings rather than a consequence of current caption text design. Next, we developed three novel captioning models that depicted, beyond words, features from prosody, emotions, and a mix of both. In an empirical study, 16 dhh participants compared these models with conventional captions. The emotion-based model outperformed traditional captions in depicting emotions and emphasis, with only a moderate loss in legibility, suggesting its potential as a more inclusive design for captions.","PeriodicalId":314098,"journal":{"name":"Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems","volume":"109 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3544548.3581511","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Speech is expressive in ways that caption text does not capture, with emotion or emphasis information not conveyed. We interviewed eight Deaf and Hard-of-Hearing (dhh) individuals to understand if and how captions’ inexpressiveness impacts them in online meetings with hearing peers. Automatically captioned speech, we found, lacks affective depth, lending it a hard-to-parse ambiguity and general dullness. Interviewees regularly feel excluded, which some understand is an inherent quality of these types of meetings rather than a consequence of current caption text design. Next, we developed three novel captioning models that depicted, beyond words, features from prosody, emotions, and a mix of both. In an empirical study, 16 dhh participants compared these models with conventional captions. The emotion-based model outperformed traditional captions in depicting emotions and emphasis, with only a moderate loss in legibility, suggesting its potential as a more inclusive design for captions.

查看原文本刊更多论文

语言韵律和情感在字幕中的可视化:聋人和听力障碍者的可及性

言语的表达方式是文字标题无法捕捉到的，它带有情感或强调信息，无法传达。我们采访了8位聋人和听力障碍者(dhh)，以了解字幕的缺乏表达是否以及如何影响他们与听力障碍者的在线会议。我们发现，自动配字幕的演讲缺乏情感深度，导致难以解析的模糊性和普遍的沉闷。受访者经常感到被排除在外，有些人认为这是这些类型会议的固有品质，而不是当前标题文本设计的结果。接下来，我们开发了三种新的字幕模型，除了文字之外，还描述了韵律、情感和两者的混合特征。在一项实证研究中，16名dhh参与者将这些模型与传统字幕进行了比较。基于情感的模型在描述情感和强调方面优于传统的字幕，仅在易读性上有适度的损失，这表明它有潜力成为更具包容性的字幕设计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

自引率

0.00%

发文量