{"title":"Emphatic human interaction analysis for cognitive environments","authors":"C. Regazzoni","doi":"10.1145/1877868.1877870","DOIUrl":null,"url":null,"abstract":"Understanding the dynamic evolution of complex scenes where multiple patterns interact according to a hidden semantic goal is an issue of current intelligent environments. This issue is made somehow more complex due to the more spread and intensive use of camera systems to help human operators in the monitoring task. Analyzing multimedia data provided by wide set of cameras simultaneously monitoring different environments makes it necessary not only to focus the attention of human operators on relevant occurring events, but also to actively support their decision about optimal reactions to be taken to manage abnormal situations. Cognitive tasks to be modeled in integrated intelligent systems become not only multisensor data processing and scene understanding, but also proactive decision making: a recognized abnormal interactive situation occurring in the scene must be possibly controlled in such a way that divergence from normal event flow can not compromise security level of an environment.\n Cognitive environments often aim at friendly improving the usefulness of a given physical space by humans according to a given paradigm and objective of use. To this end, they often employ pervasive communications tools to send messages to cooperative humans in a given environment to help me in real time situations they are living, in order to help them to accomplish their tasks in a more smooth and effective way. To do so, they can use situation assessment tools interpreting available sensor data in terms of dynamic state and events generated by objects present in their scene and their interactions. In many cases, assessed situation can be not only estimated but also predicted, if dynamic models of it are available.\n Capability of predicting behavior of objects along a given interaction situation can be interpreted as a way to directly evaluate not only evolution of actions of a given object in a contextual framework determined by the interacting object, but also as a way to estimate and to predict (based on a indirect observation and an appropriate model) the subjective emotional and motivational hidden variables that carried the object to decide a certain action to be performed on the basis of subjectively sensed data. Therefore, if appropriate models are available a sort of empathic interaction analysis can be performed that should allow a cognitive environment to be \"immersively\" connected with interacting entities, being able to predict actions they will take in given contextual situation.\n Cognitive environments can take advantage of such an empathic interaction analysis in case they can be in communication with some of the humans involved in a given interaction, for example by using wireless terminals or varying message panels in a physical environment. In this case it comes out that it becomes interesting to study which architecture and processing methods can be used to design cognitive environments intelligence as a set of concurring continuous loops closing the gap between sensing and acting on real time evolving world.\n Based on the explanation of such premises, In this talk, attention will be paid to human interaction video analysis methods that are based on data representations suitable for allowing \"immersive\" estimation and prediction by an observing intelligent environment. Examples will be discussed of Bayesian approaches to representation and learning of interactions from video scene examples currently studied in our research group (www.isip40.it).\n Such approaches span from video tracking and behavior understanding issues, aiming at provide a robust basic vocabulary of video processing tools to detect and analyze human motion at finer resolution scales (i.e. multiple feature dynamic shape analysis), to development of methods to represent empathic models of interactions at coarser trajectory based scales. Coupled Dynamic Bayesian Networks are used in both cases as a problem representation guideline. In the latter case of coarser scale of analysis at the trajectory level, interaction structure is also learned by using bio-inspired principles. In both cases incremental adaptation is obtained as a result of the followed Bayesian approach. Architectural schemes and examples will be provided in the talk of the use of such techniques within cognitive systems where cooperative humans can be helped in performing a given interaction tasks by predictions obtained by empathic interaction models.","PeriodicalId":360789,"journal":{"name":"ACM/IEEE international workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Stream","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM/IEEE international workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Stream","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1877868.1877870","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Understanding the dynamic evolution of complex scenes where multiple patterns interact according to a hidden semantic goal is an issue of current intelligent environments. This issue is made somehow more complex due to the more spread and intensive use of camera systems to help human operators in the monitoring task. Analyzing multimedia data provided by wide set of cameras simultaneously monitoring different environments makes it necessary not only to focus the attention of human operators on relevant occurring events, but also to actively support their decision about optimal reactions to be taken to manage abnormal situations. Cognitive tasks to be modeled in integrated intelligent systems become not only multisensor data processing and scene understanding, but also proactive decision making: a recognized abnormal interactive situation occurring in the scene must be possibly controlled in such a way that divergence from normal event flow can not compromise security level of an environment.
Cognitive environments often aim at friendly improving the usefulness of a given physical space by humans according to a given paradigm and objective of use. To this end, they often employ pervasive communications tools to send messages to cooperative humans in a given environment to help me in real time situations they are living, in order to help them to accomplish their tasks in a more smooth and effective way. To do so, they can use situation assessment tools interpreting available sensor data in terms of dynamic state and events generated by objects present in their scene and their interactions. In many cases, assessed situation can be not only estimated but also predicted, if dynamic models of it are available.
Capability of predicting behavior of objects along a given interaction situation can be interpreted as a way to directly evaluate not only evolution of actions of a given object in a contextual framework determined by the interacting object, but also as a way to estimate and to predict (based on a indirect observation and an appropriate model) the subjective emotional and motivational hidden variables that carried the object to decide a certain action to be performed on the basis of subjectively sensed data. Therefore, if appropriate models are available a sort of empathic interaction analysis can be performed that should allow a cognitive environment to be "immersively" connected with interacting entities, being able to predict actions they will take in given contextual situation.
Cognitive environments can take advantage of such an empathic interaction analysis in case they can be in communication with some of the humans involved in a given interaction, for example by using wireless terminals or varying message panels in a physical environment. In this case it comes out that it becomes interesting to study which architecture and processing methods can be used to design cognitive environments intelligence as a set of concurring continuous loops closing the gap between sensing and acting on real time evolving world.
Based on the explanation of such premises, In this talk, attention will be paid to human interaction video analysis methods that are based on data representations suitable for allowing "immersive" estimation and prediction by an observing intelligent environment. Examples will be discussed of Bayesian approaches to representation and learning of interactions from video scene examples currently studied in our research group (www.isip40.it).
Such approaches span from video tracking and behavior understanding issues, aiming at provide a robust basic vocabulary of video processing tools to detect and analyze human motion at finer resolution scales (i.e. multiple feature dynamic shape analysis), to development of methods to represent empathic models of interactions at coarser trajectory based scales. Coupled Dynamic Bayesian Networks are used in both cases as a problem representation guideline. In the latter case of coarser scale of analysis at the trajectory level, interaction structure is also learned by using bio-inspired principles. In both cases incremental adaptation is obtained as a result of the followed Bayesian approach. Architectural schemes and examples will be provided in the talk of the use of such techniques within cognitive systems where cooperative humans can be helped in performing a given interaction tasks by predictions obtained by empathic interaction models.