Emphatic human interaction analysis for cognitive environments

ACM/IEEE international workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Stream Pub Date : 2010-10-29 DOI:10.1145/1877868.1877870

C. Regazzoni

{"title":"Emphatic human interaction analysis for cognitive environments","authors":"C. Regazzoni","doi":"10.1145/1877868.1877870","DOIUrl":null,"url":null,"abstract":"Understanding the dynamic evolution of complex scenes where multiple patterns interact according to a hidden semantic goal is an issue of current intelligent environments. This issue is made somehow more complex due to the more spread and intensive use of camera systems to help human operators in the monitoring task. Analyzing multimedia data provided by wide set of cameras simultaneously monitoring different environments makes it necessary not only to focus the attention of human operators on relevant occurring events, but also to actively support their decision about optimal reactions to be taken to manage abnormal situations. Cognitive tasks to be modeled in integrated intelligent systems become not only multisensor data processing and scene understanding, but also proactive decision making: a recognized abnormal interactive situation occurring in the scene must be possibly controlled in such a way that divergence from normal event flow can not compromise security level of an environment.\n Cognitive environments often aim at friendly improving the usefulness of a given physical space by humans according to a given paradigm and objective of use. To this end, they often employ pervasive communications tools to send messages to cooperative humans in a given environment to help me in real time situations they are living, in order to help them to accomplish their tasks in a more smooth and effective way. To do so, they can use situation assessment tools interpreting available sensor data in terms of dynamic state and events generated by objects present in their scene and their interactions. In many cases, assessed situation can be not only estimated but also predicted, if dynamic models of it are available.\n Capability of predicting behavior of objects along a given interaction situation can be interpreted as a way to directly evaluate not only evolution of actions of a given object in a contextual framework determined by the interacting object, but also as a way to estimate and to predict (based on a indirect observation and an appropriate model) the subjective emotional and motivational hidden variables that carried the object to decide a certain action to be performed on the basis of subjectively sensed data. Therefore, if appropriate models are available a sort of empathic interaction analysis can be performed that should allow a cognitive environment to be \"immersively\" connected with interacting entities, being able to predict actions they will take in given contextual situation.\n Cognitive environments can take advantage of such an empathic interaction analysis in case they can be in communication with some of the humans involved in a given interaction, for example by using wireless terminals or varying message panels in a physical environment. In this case it comes out that it becomes interesting to study which architecture and processing methods can be used to design cognitive environments intelligence as a set of concurring continuous loops closing the gap between sensing and acting on real time evolving world.\n Based on the explanation of such premises, In this talk, attention will be paid to human interaction video analysis methods that are based on data representations suitable for allowing \"immersive\" estimation and prediction by an observing intelligent environment. Examples will be discussed of Bayesian approaches to representation and learning of interactions from video scene examples currently studied in our research group (www.isip40.it).\n Such approaches span from video tracking and behavior understanding issues, aiming at provide a robust basic vocabulary of video processing tools to detect and analyze human motion at finer resolution scales (i.e. multiple feature dynamic shape analysis), to development of methods to represent empathic models of interactions at coarser trajectory based scales. Coupled Dynamic Bayesian Networks are used in both cases as a problem representation guideline. In the latter case of coarser scale of analysis at the trajectory level, interaction structure is also learned by using bio-inspired principles. In both cases incremental adaptation is obtained as a result of the followed Bayesian approach. Architectural schemes and examples will be provided in the talk of the use of such techniques within cognitive systems where cooperative humans can be helped in performing a given interaction tasks by predictions obtained by empathic interaction models.","PeriodicalId":360789,"journal":{"name":"ACM/IEEE international workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Stream","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM/IEEE international workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Stream","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1877868.1877870","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Understanding the dynamic evolution of complex scenes where multiple patterns interact according to a hidden semantic goal is an issue of current intelligent environments. This issue is made somehow more complex due to the more spread and intensive use of camera systems to help human operators in the monitoring task. Analyzing multimedia data provided by wide set of cameras simultaneously monitoring different environments makes it necessary not only to focus the attention of human operators on relevant occurring events, but also to actively support their decision about optimal reactions to be taken to manage abnormal situations. Cognitive tasks to be modeled in integrated intelligent systems become not only multisensor data processing and scene understanding, but also proactive decision making: a recognized abnormal interactive situation occurring in the scene must be possibly controlled in such a way that divergence from normal event flow can not compromise security level of an environment. Cognitive environments often aim at friendly improving the usefulness of a given physical space by humans according to a given paradigm and objective of use. To this end, they often employ pervasive communications tools to send messages to cooperative humans in a given environment to help me in real time situations they are living, in order to help them to accomplish their tasks in a more smooth and effective way. To do so, they can use situation assessment tools interpreting available sensor data in terms of dynamic state and events generated by objects present in their scene and their interactions. In many cases, assessed situation can be not only estimated but also predicted, if dynamic models of it are available. Capability of predicting behavior of objects along a given interaction situation can be interpreted as a way to directly evaluate not only evolution of actions of a given object in a contextual framework determined by the interacting object, but also as a way to estimate and to predict (based on a indirect observation and an appropriate model) the subjective emotional and motivational hidden variables that carried the object to decide a certain action to be performed on the basis of subjectively sensed data. Therefore, if appropriate models are available a sort of empathic interaction analysis can be performed that should allow a cognitive environment to be "immersively" connected with interacting entities, being able to predict actions they will take in given contextual situation. Cognitive environments can take advantage of such an empathic interaction analysis in case they can be in communication with some of the humans involved in a given interaction, for example by using wireless terminals or varying message panels in a physical environment. In this case it comes out that it becomes interesting to study which architecture and processing methods can be used to design cognitive environments intelligence as a set of concurring continuous loops closing the gap between sensing and acting on real time evolving world. Based on the explanation of such premises, In this talk, attention will be paid to human interaction video analysis methods that are based on data representations suitable for allowing "immersive" estimation and prediction by an observing intelligent environment. Examples will be discussed of Bayesian approaches to representation and learning of interactions from video scene examples currently studied in our research group (www.isip40.it). Such approaches span from video tracking and behavior understanding issues, aiming at provide a robust basic vocabulary of video processing tools to detect and analyze human motion at finer resolution scales (i.e. multiple feature dynamic shape analysis), to development of methods to represent empathic models of interactions at coarser trajectory based scales. Coupled Dynamic Bayesian Networks are used in both cases as a problem representation guideline. In the latter case of coarser scale of analysis at the trajectory level, interaction structure is also learned by using bio-inspired principles. In both cases incremental adaptation is obtained as a result of the followed Bayesian approach. Architectural schemes and examples will be provided in the talk of the use of such techniques within cognitive systems where cooperative humans can be helped in performing a given interaction tasks by predictions obtained by empathic interaction models.

查看原文本刊更多论文

强调对认知环境的人类互动分析

理解复杂场景的动态演变，其中多个模式根据隐藏的语义目标进行交互是当前智能环境的一个问题。这个问题变得更加复杂，因为越来越广泛和密集地使用摄像系统来帮助操作员进行监控任务。分析同时监控不同环境的多种摄像机提供的多媒体数据，不仅需要将人类操作员的注意力集中在相关发生的事件上，而且还需要积极支持他们做出决策，以采取最佳反应来管理异常情况。集成智能系统中需要建模的认知任务不仅包括多传感器数据处理和场景理解，还包括主动决策:在场景中发生的识别到的异常交互情况必须尽可能地加以控制，使与正常事件流的偏离不会危及环境的安全级别。认知环境通常旨在根据给定的范例和使用目标，友好地提高人类对给定物理空间的有用性。为此，他们经常使用无处不在的通信工具，在给定的环境中向合作的人类发送消息，以帮助我在他们所生活的实时情况下，以帮助他们以更顺利和有效的方式完成任务。为此，他们可以使用情况评估工具，根据动态状态和场景中存在的物体及其相互作用产生的事件来解释可用的传感器数据。在许多情况下，如果有动态模型，评估的情况不仅可以估计，而且可以预测。预测对象在给定交互情况下的行为的能力可以被解释为一种直接评估的方法，不仅是在由交互对象决定的上下文框架中给定对象的行动演变，也可以作为一种估计和预测(基于间接观察和适当的模型)主观情绪和动机隐藏变量的方法，这些隐藏变量承载着对象根据主观感知的数据来决定要执行的某种动作。因此，如果适当的模型可用，一种共情互动分析可以执行，应该允许一个认知环境是“沉浸式”连接与互动实体，能够预测行动，他们将采取在给定的上下文情况。认知环境可以利用这种共情交互分析，如果它们可以与参与给定交互的一些人进行通信，例如通过使用无线终端或物理环境中的不同消息面板。在这种情况下，研究哪些架构和处理方法可以用来设计认知环境智能，将其作为一组并行的连续循环，缩小感知和行动之间的差距，这变得很有趣。基于对这些前提的解释，在本次演讲中，我们将关注基于数据表示的人机交互视频分析方法，这些方法适合于通过观察智能环境来进行“沉浸式”估计和预测。示例将讨论贝叶斯方法来表示和学习我们研究小组目前研究的视频场景示例中的交互(www.isip40.it)。这些方法涵盖了视频跟踪和行为理解问题，旨在提供一个强大的视频处理工具的基本词汇表，以更精细的分辨率尺度(即多特征动态形状分析)检测和分析人体运动，以及开发在更粗糙的基于轨迹的尺度上表示交互共情模型的方法。在这两种情况下，耦合动态贝叶斯网络被用作问题表示指南。在后一种情况下，在轨迹水平上进行更粗略的分析，相互作用结构也通过使用生物启发原理来学习。在这两种情况下，增量适应都是贝叶斯方法的结果。架构方案和示例将在认知系统中使用这些技术的演讲中提供，其中合作的人类可以通过共情交互模型获得的预测来帮助执行给定的交互任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM/IEEE international workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Stream

自引率

0.00%

发文量