理解人类声音表达的多模态方法及超越

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI:10.1145/3242969.3243391

Shrikanth S. Narayanan

{"title":"理解人类声音表达的多模态方法及超越","authors":"Shrikanth S. Narayanan","doi":"10.1145/3242969.3243391","DOIUrl":null,"url":null,"abstract":"Human verbal and nonverbal expressions carry crucial information not only about intent but also emotions, individual identity, and the state of health and wellbeing. From a basic science perspective, understanding how such rich information is encoded in these signals can illuminate underlying production mechanisms including the variability therein, within and across individuals. From a technology perspective, finding ways for automatically processing and decoding this complex information continues to be of interest across a variety of applications. The convergence of sensing, communication and computing technologies is allowing access to data, in diverse forms and modalities, in ways that were unimaginable even a few years ago. These include data that afford the multimodal analysis and interpretation of the generation of human expressions. The first part of the talk will highlight advances that allow us to perform investigations on the dynamics of vocal production using real-time imaging and audio modeling to offer insights about how we produce speech and song with the vocal instrument. The second part of the talk will focus on the production of vocal expressions in conjunction with other signals from the face and body especially in encoding affect. The talk will draw data from various domains notably in health to illustrate some of the applications.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Multimodal Approach to Understanding Human Vocal Expressions and Beyond\",\"authors\":\"Shrikanth S. Narayanan\",\"doi\":\"10.1145/3242969.3243391\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human verbal and nonverbal expressions carry crucial information not only about intent but also emotions, individual identity, and the state of health and wellbeing. From a basic science perspective, understanding how such rich information is encoded in these signals can illuminate underlying production mechanisms including the variability therein, within and across individuals. From a technology perspective, finding ways for automatically processing and decoding this complex information continues to be of interest across a variety of applications. The convergence of sensing, communication and computing technologies is allowing access to data, in diverse forms and modalities, in ways that were unimaginable even a few years ago. These include data that afford the multimodal analysis and interpretation of the generation of human expressions. The first part of the talk will highlight advances that allow us to perform investigations on the dynamics of vocal production using real-time imaging and audio modeling to offer insights about how we produce speech and song with the vocal instrument. The second part of the talk will focus on the production of vocal expressions in conjunction with other signals from the face and body especially in encoding affect. The talk will draw data from various domains notably in health to illustrate some of the applications.\",\"PeriodicalId\":308751,\"journal\":{\"name\":\"Proceedings of the 20th ACM International Conference on Multimodal Interaction\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 20th ACM International Conference on Multimodal Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3242969.3243391\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3242969.3243391","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

人类的语言和非语言表达不仅包含意图，还包含情感、个人身份、健康和幸福状态等重要信息。从基础科学的角度来看，理解这些丰富的信息是如何在这些信号中编码的，可以阐明潜在的生产机制，包括个体内部和个体之间的可变性。从技术角度来看，寻找自动处理和解码这些复杂信息的方法仍然是各种应用程序感兴趣的问题。传感、通信和计算技术的融合使人们能够以几年前难以想象的方式以各种形式和方式获取数据。这些数据包括对人类表情生成进行多模态分析和解释的数据。演讲的第一部分将重点介绍使我们能够利用实时成像和音频建模对声音产生的动态进行调查的进展，以提供关于我们如何用声乐乐器产生语音和歌曲的见解。演讲的第二部分将集中于声音表达的产生与面部和身体的其他信号的结合，特别是在编码情感方面。讲座将从各个领域，特别是健康领域，提取数据来说明一些应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Multimodal Approach to Understanding Human Vocal Expressions and Beyond

Human verbal and nonverbal expressions carry crucial information not only about intent but also emotions, individual identity, and the state of health and wellbeing. From a basic science perspective, understanding how such rich information is encoded in these signals can illuminate underlying production mechanisms including the variability therein, within and across individuals. From a technology perspective, finding ways for automatically processing and decoding this complex information continues to be of interest across a variety of applications. The convergence of sensing, communication and computing technologies is allowing access to data, in diverse forms and modalities, in ways that were unimaginable even a few years ago. These include data that afford the multimodal analysis and interpretation of the generation of human expressions. The first part of the talk will highlight advances that allow us to perform investigations on the dynamics of vocal production using real-time imaging and audio modeling to offer insights about how we produce speech and song with the vocal instrument. The second part of the talk will focus on the production of vocal expressions in conjunction with other signals from the face and body especially in encoding affect. The talk will draw data from various domains notably in health to illustrate some of the applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 20th ACM International Conference on Multimodal Interaction

自引率

0.00%

发文量