Proceedings of the ... ACM International Conference on Multimedia, with co-located Symposium & Workshops. ACM International Conference on Multimedia最新文献

筛选
英文 中文
Mutual Correlation Attentive Factors in Dyadic Fusion Networks for Speech Emotion Recognition. 用于语音情感识别的双向融合网络中的相互关联注意因素
Yue Gu, Xinyu Lyu, Weijia Sun, Weitian Li, Shuhong Chen, Xinyu Li, Marsic Ivan
{"title":"Mutual Correlation Attentive Factors in Dyadic Fusion Networks for Speech Emotion Recognition.","authors":"Yue Gu, Xinyu Lyu, Weijia Sun, Weitian Li, Shuhong Chen, Xinyu Li, Marsic Ivan","doi":"10.1145/3343031.3351039","DOIUrl":"10.1145/3343031.3351039","url":null,"abstract":"<p><p>Emotion recognition in dyadic communication is challenging because: 1. Extracting informative modality-specific representations requires disparate feature extractor designs due to the heterogenous input data formats. 2. How to effectively and efficiently fuse unimodal features and learn associations between dyadic utterances are critical to the model generalization in actual scenario. 3. Disagreeing annotations prevent previous approaches from precisely predicting emotions in context. To address the above issues, we propose an efficient dyadic fusion network that only relies on an attention mechanism to select representative vectors, fuse modality-specific features, and learn the sequence information. Our approach has three distinct characteristics: 1. Instead of using a recurrent neural network to extract temporal associations as in most previous research, we introduce multiple sub-view attention layers to compute the relevant dependencies among sequential utterances; this significantly improves model efficiency. 2. To improve fusion performance, we design a learnable mutual correlation factor inside each attention layer to compute associations across different modalities. 3. To overcome the label disagreement issue, we embed the labels from all annotators into a k-dimensional vector and transform the categorical problem into a regression problem; this method provides more accurate annotation information and fully uses the entire dataset. We evaluate the proposed model on two published multimodal emotion recognition datasets: IEMOCAP and MELD. Our model significantly outperforms previous state-of-the-art research by 3.8%-7.5% accuracy, using a more efficient model.</p>","PeriodicalId":90687,"journal":{"name":"Proceedings of the ... ACM International Conference on Multimedia, with co-located Symposium & Workshops. ACM International Conference on Multimedia","volume":"2019 ","pages":"157-166"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7085887/pdf/nihms-1571671.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37763064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human Conversation Analysis Using Attentive Multimodal Networks with Hierarchical Encoder-Decoder. 利用具有层次编码器-解码器的注意力多模态网络进行人类对话分析
Yue Gu, Xinyu Li, Kaixiang Huang, Shiyu Fu, Kangning Yang, Shuhong Chen, Moliang Zhou, Ivan Marsic
{"title":"Human Conversation Analysis Using Attentive Multimodal Networks with Hierarchical Encoder-Decoder.","authors":"Yue Gu, Xinyu Li, Kaixiang Huang, Shiyu Fu, Kangning Yang, Shuhong Chen, Moliang Zhou, Ivan Marsic","doi":"10.1145/3240508.3240714","DOIUrl":"10.1145/3240508.3240714","url":null,"abstract":"<p><p>Human conversation analysis is challenging because the meaning can be expressed through words, intonation, or even body language and facial expression. We introduce a hierarchical encoder-decoder structure with attention mechanism for conversation analysis. The hierarchical encoder learns word-level features from video, audio, and text data that are then formulated into conversation-level features. The corresponding hierarchical decoder is able to predict different attributes at given time instances. To integrate multiple sensory inputs, we introduce a novel fusion strategy with modality attention. We evaluated our system on published emotion recognition, sentiment analysis, and speaker trait analysis datasets. Our system outperformed previous state-of-the-art approaches in both classification and regressions tasks on three datasets. We also outperformed previous approaches in generalization tests on two commonly used datasets. We achieved comparable performance in predicting co-existing labels using the proposed model instead of multiple individual models. In addition, the easily-visualized modality and temporal attention demonstrated that the proposed attention mechanism helps feature selection and improves model interpretability.</p>","PeriodicalId":90687,"journal":{"name":"Proceedings of the ... ACM International Conference on Multimedia, with co-located Symposium & Workshops. ACM International Conference on Multimedia","volume":"2018 ","pages":"537-545"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7085889/pdf/nihms-1571718.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37763063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Modal Health State Estimation. 跨模态健康状态估计。
Nitish Nag, Vaibhav Pandey, Preston J Putzel, Hari Bhimaraju, Srikanth Krishnan, Ramesh Jain
{"title":"Cross-Modal Health State Estimation.","authors":"Nitish Nag, Vaibhav Pandey, Preston J Putzel, Hari Bhimaraju, Srikanth Krishnan, Ramesh Jain","doi":"10.1145/3240508.3241913","DOIUrl":"10.1145/3240508.3241913","url":null,"abstract":"<p><p>Individuals create and consume more diverse data about themselves today than any time in history. Sources of this data include wearable devices, images, social media, geo-spatial information and more. A tremendous opportunity rests within cross-modal data analysis that leverages existing domain knowledge methods to understand and guide human health. Especially in chronic diseases, current medical practice uses a combination of sparse hospital based biological metrics (blood tests, expensive imaging, etc.) to understand the evolving health status of an individual. Future health systems must integrate data created at the individual level to better understand health status perpetually, especially in a cybernetic framework. In this work we fuse multiple user created and open source data streams along with established biomedical domain knowledge to give two types of quantitative state estimates of cardiovascular health. First, we use wearable devices to calculate cardiorespiratory fitness (CRF), a known quantitative leading predictor of heart disease which is not routinely collected in clinical settings. Second, we estimate inherent genetic traits, living environmental risks, circadian rhythm, and biological metrics from a diverse dataset. Our experimental results on 24 subjects demonstrate how multi-modal data can provide personalized health insight. Understanding the dynamic nature of health status will pave the way for better health based recommendation engines, better clinical decision making and positive lifestyle changes.</p>","PeriodicalId":90687,"journal":{"name":"Proceedings of the ... ACM International Conference on Multimedia, with co-located Symposium & Workshops. ACM International Conference on Multimedia","volume":"2018 ","pages":"1993-2002"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6530992/pdf/nihms-1026575.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37277202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Region-based Activity Recognition Using Conditional GAN. 使用条件GAN的基于区域的活动识别。
Xinyu Li, Yanyi Zhang, Jianyu Zhang, Yueyang Chen, Huangcan Li, Ivan Marsic, Randall S Burd
{"title":"Region-based Activity Recognition Using Conditional GAN.","authors":"Xinyu Li,&nbsp;Yanyi Zhang,&nbsp;Jianyu Zhang,&nbsp;Yueyang Chen,&nbsp;Huangcan Li,&nbsp;Ivan Marsic,&nbsp;Randall S Burd","doi":"10.1145/3123266.3123365","DOIUrl":"10.1145/3123266.3123365","url":null,"abstract":"<p><p>We present a method for activity recognition that first estimates the activity performer's location and uses it with input data for activity recognition. Existing approaches directly take video frames or entire video for feature extraction and recognition, and treat the classifier as a black box. Our method first locates the activities in each input video frame by generating an activity mask using a conditional generative adversarial network (cGAN). The generated mask is appended to color channels of input images and fed into a VGG-LSTM network for activity recognition. To test our system, we produced two datasets with manually created masks, one containing Olympic sports activities and the other containing trauma resuscitation activities. Our system makes activity prediction for each video frame and achieves performance comparable to the state-of-the-art systems while simultaneously outlining the location of the activity. We show how the generated masks facilitate the learning of features that are representative of the activity rather than accidental surrounding information.</p>","PeriodicalId":90687,"journal":{"name":"Proceedings of the ... ACM International Conference on Multimedia, with co-located Symposium & Workshops. ACM International Conference on Multimedia","volume":"2017 ","pages":"1059-1067"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3123266.3123365","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36624678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
On Shape and the Computability of Emotions. 论形状与情感的可计算性。
Xin Lu, Poonam Suryanarayan, Reginald B Adams, Jia Li, Michelle G Newman, James Z Wang
{"title":"On Shape and the Computability of Emotions.","authors":"Xin Lu,&nbsp;Poonam Suryanarayan,&nbsp;Reginald B Adams,&nbsp;Jia Li,&nbsp;Michelle G Newman,&nbsp;James Z Wang","doi":"10.1145/2393347.2393384","DOIUrl":"https://doi.org/10.1145/2393347.2393384","url":null,"abstract":"<p><p>We investigated how shape features in natural images influence emotions aroused in human beings. Shapes and their characteristics such as roundness, angularity, simplicity, and complexity have been postulated to affect the emotional responses of human beings in the field of visual arts and psychology. However, no prior research has modeled the dimensionality of emotions aroused by roundness and angularity. Our contributions include an in-depth statistical analysis to understand the relationship between shapes and emotions. Through experimental results on the International Affective Picture System (IAPS) dataset we provide evidence for the significance of roundness-angularity and simplicity-complexity on predicting emotional content in images. We combine our shape features with other state-of-the-art features to show a gain in prediction and classification accuracy. We model emotions from a dimensional perspective in order to predict valence and arousal ratings which have advantages over modeling the traditional discrete emotional categories. Finally, we distinguish images with strong emotional content from emotionally neutral images with high accuracy.</p>","PeriodicalId":90687,"journal":{"name":"Proceedings of the ... ACM International Conference on Multimedia, with co-located Symposium & Workshops. ACM International Conference on Multimedia","volume":"2012 ","pages":"229-238"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2393347.2393384","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41223246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 139
Recognizing Clothes Patterns for Blind People by Confidence Margin based Feature Combination. 基于置信度的特征组合识别盲人服装图案
Xiaodong Yang, Shuai Yuan, YingLi Tian
{"title":"Recognizing Clothes Patterns for Blind People by Confidence Margin based Feature Combination.","authors":"Xiaodong Yang,&nbsp;Shuai Yuan,&nbsp;YingLi Tian","doi":"10.1145/2072298.2071947","DOIUrl":"https://doi.org/10.1145/2072298.2071947","url":null,"abstract":"<p><p>Clothes pattern recognition is a challenging task for blind or visually impaired people. Automatic clothes pattern recognition is also a challenging problem in computer vision due to the large pattern variations. In this paper, we present a new method to classify clothes patterns into 4 categories: stripe, lattice, special, and patternless. While existing texture analysis methods mainly focused on textures varying with distinctive pattern changes, they cannot achieve the same level of accuracy for clothes pattern recognition because of the large intra-class variations in each clothes pattern category. To solve this problem, we extract both structural feature and statistical feature from image wavelet subbands. Furthermore, we develop a new feature combination scheme based on the confidence margin of a classifier to combine the two types of features to form a novel local image descriptor in a compact and discriminative format. The recognition experiment is conducted on a database with 627 clothes images of 4 categories of patterns. Experimental results demonstrate that the proposed method significantly outperforms the state-of-the-art texture analysis methods in the context of clothes pattern recognition.</p>","PeriodicalId":90687,"journal":{"name":"Proceedings of the ... ACM International Conference on Multimedia, with co-located Symposium & Workshops. ACM International Conference on Multimedia","volume":"2011 ","pages":"1097-1100"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2072298.2071947","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32721226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信