Human-centered Multimodal Machine Intelligence

Shrikanth S. Narayanan
{"title":"Human-centered Multimodal Machine Intelligence","authors":"Shrikanth S. Narayanan","doi":"10.1145/3382507.3417974","DOIUrl":null,"url":null,"abstract":"Multimodal machine intelligence offers enormous possibilities for helping understand the human condition and in creating technologies to support and enhance human experiences [1, 2]. What makes such approaches and systems exciting is the promise they hold for adaptation and personalization in the presence of the rich and vast inherent heterogeneity, variety and diversity within and across people. Multimodal engineering approaches can help analyze human trait (e.g., age), state (e.g., emotion), and behavior dynamics (e.g., interaction synchrony) objectively, and at scale. Machine intelligence could also help detect and analyze deviation in patterns from what is deemed typical. These techniques in turn can assist, facilitate or enhance decision making by humans, and by autonomous systems. Realizing such a promise requires addressing two major lines of, oft intertwined, challenges: creating inclusive technologies that work for everyone while enabling tools that can illuminate the source of variability or difference of interest. This talk will highlight some of these possibilities and opportunities through examples drawn from two specific domains. The first relates to advancing health informatics in behavioral and mental health [3, 4]. With over 10% of the world's population affected, and with clinical research and practice heavily dependent on (relatively scarce) human expertise in diagnosing, managing and treating the condition, engineering opportunities in offering access and tools to support care at scale are immense. For example, in determining whether a child is on the Autism spectrum, a clinician would engage and observe a child in a series of interactive activities, targeting relevant cognitive, communicative and socio- emotional aspects, and codify specific patterns of interest e.g., typicality of vocal intonation, facial expressions, joint attention behavior. Machine intelligence driven processing of speech, language, visual and physiological data, and combining them with other forms of clinical data, enable novel and objective ways of supporting and scaling up these diagnostics. Likewise, multimodal systems can automate the analysis of a psychotherapy session, including computing treatment quality-assurance measures e.g., rating a therapist's expressed empathy. These technology possibilities can go beyond the traditional realm of clinics, directly to patients in their natural settings. For example, remote multimodal sensing of biobehavioral cues can enable new ways for screening and tracking behaviors (e.g., stress in workplace) and progress to treatment (e.g., for depression), and offer just in time support. The second example is drawn from the world of media. Media are created by humans and for humans to tell stories. They cover an amazing range of domains'from the arts and entertainment to news, education and commerce and in staggering volume. Machine intelligence tools can help analyze media and measure their impact on individuals and society. This includes offering objective insights into diversity and inclusion in media representations through robustly characterizing media portrayals from an intersectional perspective along relevant dimensions of inclusion: gender, race, gender, age, ability and other attributes, and in creating tools to support change [5,6]. Again this underscores the twin technology requirements: to perform equally well in characterizing individuals regardless of the dimensions of the variability, and use those inclusive technologies to shine light on and create tools to support diversity and inclusion.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3382507.3417974","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Multimodal machine intelligence offers enormous possibilities for helping understand the human condition and in creating technologies to support and enhance human experiences [1, 2]. What makes such approaches and systems exciting is the promise they hold for adaptation and personalization in the presence of the rich and vast inherent heterogeneity, variety and diversity within and across people. Multimodal engineering approaches can help analyze human trait (e.g., age), state (e.g., emotion), and behavior dynamics (e.g., interaction synchrony) objectively, and at scale. Machine intelligence could also help detect and analyze deviation in patterns from what is deemed typical. These techniques in turn can assist, facilitate or enhance decision making by humans, and by autonomous systems. Realizing such a promise requires addressing two major lines of, oft intertwined, challenges: creating inclusive technologies that work for everyone while enabling tools that can illuminate the source of variability or difference of interest. This talk will highlight some of these possibilities and opportunities through examples drawn from two specific domains. The first relates to advancing health informatics in behavioral and mental health [3, 4]. With over 10% of the world's population affected, and with clinical research and practice heavily dependent on (relatively scarce) human expertise in diagnosing, managing and treating the condition, engineering opportunities in offering access and tools to support care at scale are immense. For example, in determining whether a child is on the Autism spectrum, a clinician would engage and observe a child in a series of interactive activities, targeting relevant cognitive, communicative and socio- emotional aspects, and codify specific patterns of interest e.g., typicality of vocal intonation, facial expressions, joint attention behavior. Machine intelligence driven processing of speech, language, visual and physiological data, and combining them with other forms of clinical data, enable novel and objective ways of supporting and scaling up these diagnostics. Likewise, multimodal systems can automate the analysis of a psychotherapy session, including computing treatment quality-assurance measures e.g., rating a therapist's expressed empathy. These technology possibilities can go beyond the traditional realm of clinics, directly to patients in their natural settings. For example, remote multimodal sensing of biobehavioral cues can enable new ways for screening and tracking behaviors (e.g., stress in workplace) and progress to treatment (e.g., for depression), and offer just in time support. The second example is drawn from the world of media. Media are created by humans and for humans to tell stories. They cover an amazing range of domains'from the arts and entertainment to news, education and commerce and in staggering volume. Machine intelligence tools can help analyze media and measure their impact on individuals and society. This includes offering objective insights into diversity and inclusion in media representations through robustly characterizing media portrayals from an intersectional perspective along relevant dimensions of inclusion: gender, race, gender, age, ability and other attributes, and in creating tools to support change [5,6]. Again this underscores the twin technology requirements: to perform equally well in characterizing individuals regardless of the dimensions of the variability, and use those inclusive technologies to shine light on and create tools to support diversity and inclusion.
以人为中心的多模态机器智能
多模态机器智能为帮助理解人类状况和创造支持和增强人类体验的技术提供了巨大的可能性[1,2]。这些方法和系统之所以令人兴奋,是因为它们有望在人体内和人与人之间丰富而巨大的内在异质性、多样性和多样性面前实现适应和个性化。多模态工程方法可以帮助客观地、大规模地分析人类特征(例如,年龄)、状态(例如,情感)和行为动态(例如,交互同步)。机器智能还可以帮助检测和分析典型模式的偏差。这些技术反过来又可以帮助、促进或加强人类和自主系统的决策。实现这样的承诺需要解决两个经常相互交织的主要挑战:创造适用于每个人的包容性技术,同时启用能够阐明可变性或兴趣差异来源的工具。本次演讲将通过两个特定领域的例子来强调其中的一些可能性和机会。第一个方面涉及在行为和心理健康领域推进健康信息学[3,4]。由于世界上10%以上的人口受到影响,并且临床研究和实践在诊断、管理和治疗该病方面严重依赖(相对稀缺的)人类专业知识,因此在提供获取途径和工具以支持大规模护理方面存在巨大的工程机会。例如,在确定一个孩子是否属于自闭症谱系时,临床医生会参与并观察孩子的一系列互动活动,以相关的认知、交际和社会情感方面为目标,并编纂特定的兴趣模式,如语音语调的典型性、面部表情、共同注意行为。机器智能驱动的语音、语言、视觉和生理数据的处理,并将它们与其他形式的临床数据相结合,使支持和扩大这些诊断的新颖和客观的方法成为可能。同样,多模式系统可以自动分析心理治疗过程,包括计算治疗质量保证措施,例如,评估治疗师表达的同理心。这些技术的可能性可以超越传统的诊所领域,直接在病人的自然环境中。例如,生物行为线索的远程多模态传感可以为筛选和跟踪行为(例如,工作场所的压力)和治疗进展(例如,抑郁症)提供新的方法,并提供及时的支持。第二个例子来自媒体世界。媒体是人类创造的,也是为人类讲述故事而创造的。它们涵盖了从艺术和娱乐到新闻、教育和商业的惊人领域,而且数量惊人。机器智能工具可以帮助分析媒体并衡量它们对个人和社会的影响。这包括通过从交叉角度出发,沿着包容性的相关维度(性别、种族、性别、年龄、能力和其他属性)对媒体描绘进行强有力的刻画,并创造支持变革的工具,从而对媒体表现中的多样性和包容性提供客观的见解[5,6]。这再次强调了双重技术要求:无论可变性的维度如何,都要在描述个人特征方面表现得同样出色,并使用这些包容性技术来照亮并创造支持多样性和包容性的工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信