Proceedings of the 26th ACM international conference on Multimedia最新文献

筛选
英文 中文
Historical Context-based Style Classification of Painting Images via Label Distribution Learning 基于历史语境的标签分布学习绘画图像风格分类
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240593
Jufeng Yang, Liyi Chen, Le Zhang, Xiaoxiao Sun, Dongyu She, Shao-Ping Lu, Ming-Ming Cheng
{"title":"Historical Context-based Style Classification of Painting Images via Label Distribution Learning","authors":"Jufeng Yang, Liyi Chen, Le Zhang, Xiaoxiao Sun, Dongyu She, Shao-Ping Lu, Ming-Ming Cheng","doi":"10.1145/3240508.3240593","DOIUrl":"https://doi.org/10.1145/3240508.3240593","url":null,"abstract":"Analyzing and categorizing the style of visual art images, especially paintings, is gaining popularity owing to its importance in understanding and appreciating the art. The evolution of painting style is both continuous, in a sense that new styles may inherit, develop or even mutate from their predecessors and multi-modal because of various issues such as the visual appearance, the birthplace, the origin time and the art movement. Motivated by this peculiarity, we introduce a novel knowledge distilling strategy to assist visual feature learning in the convolutional neural network for painting style classification. More specifically, a multi-factor distribution is employed as soft-labels to distill complementary information with visual input, which extracts from different historical context via label distribution learning. The proposed method is well-encapsulated in a multi-task learning framework which allows end-to-end training. We demonstrate the superiority of the proposed method over the state-of-the-art approaches on Painting91, OilPainting, and Pandora datasets.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125681871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Dest-ResNet: A Deep Spatiotemporal Residual Network for Hotspot Traffic Speed Prediction est- resnet:用于热点流量速度预测的深度时空残差网络
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240656
Binbing Liao, Jingqing Zhang, Ming Cai, Siliang Tang, Yifan Gao, Chao Wu, Shengwen Yang, Wenwu Zhu, Yike Guo, Fei Wu
{"title":"Dest-ResNet: A Deep Spatiotemporal Residual Network for Hotspot Traffic Speed Prediction","authors":"Binbing Liao, Jingqing Zhang, Ming Cai, Siliang Tang, Yifan Gao, Chao Wu, Shengwen Yang, Wenwu Zhu, Yike Guo, Fei Wu","doi":"10.1145/3240508.3240656","DOIUrl":"https://doi.org/10.1145/3240508.3240656","url":null,"abstract":"With the ever-increasing urbanization process, the traffic jam has become a common problem in the metropolises around the world, making the traffic speed prediction a crucial and fundamental task. This task is difficult due to the dynamic and intrinsic complexity of the traffic environment in urban cities, yet the emergence of crowd map query data sheds new light on it. In general, a burst of crowd map queries for the same destination in a short duration (called \"hotspot'') could lead to traffic congestion. For example, queries of the Capital Gym burst on weekend evenings lead to traffic jams around the gym. However, unleashing the power of crowd map queries is challenging due to the innate spatiotemporal characteristics of the crowd queries. To bridge the gap, this paper firstly discovers hotspots underlying crowd map queries. These discovered hotspots address the spatiotemporal variations. Then Dest-ResNet (Deep spatiotemporal Residual Network) is proposed for hotspot traffic speed prediction. Dest-ResNet is a sequence learning framework that jointly deals with two sequences in different modalities, i.e., the traffic speed sequence and the query sequence. The main idea of Dest-ResNet is to learn to explain and amend the errors caused when the unimodal information is applied individually. In this way, Dest-ResNet addresses the temporal causal correlation between queries and the traffic speed. As a result, Dest-ResNet shows a 30% relative boost over the state-of-the-art methods on real-world datasets from Baidu Map.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126813619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Inferring User Emotive State Changes in Realistic Human-Computer Conversational Dialogs 推断现实人机对话中用户情绪状态的变化
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240575
Runnan Li, Zhiyong Wu, Jia Jia, Jingbei Li, Wei Chen, H. Meng
{"title":"Inferring User Emotive State Changes in Realistic Human-Computer Conversational Dialogs","authors":"Runnan Li, Zhiyong Wu, Jia Jia, Jingbei Li, Wei Chen, H. Meng","doi":"10.1145/3240508.3240575","DOIUrl":"https://doi.org/10.1145/3240508.3240575","url":null,"abstract":"Human-computer conversational interactions are increasingly pervasive in real-world applications, such as chatbots and virtual assistants. The user experience can be enhanced through affective design of such conversational dialogs, especially in enabling the computer to understand the emotive state in the user's input, and to generate an appropriate system response within the dialog turn. Such a system response may further influence the user's emotive state in the subsequent dialog turn. In this paper, we focus on the change in the user's emotive states in adjacent dialog turns, to which we refer as user emotive state change. We propose a multi-modal, multi-task deep learning framework to infer the user's emotive states and emotive state changes simultaneously. Multi-task learning convolution fusion auto-encoder is applied to fuse the acoustic and textual features to generate a robust representation of the user's input. Long-short term memory recurrent auto-encoder is employed to extract features of system responses at the sentence-level to better capture factors affecting user emotive states. Multi-task learned structured output layer is adopted to model the dependency of user emotive state change, conditioned upon the user input's emotive states and system response in current dialog turn. Experimental results demonstrate the effectiveness of the proposed method.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128107137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
JPEG Decompression in the Homomorphic Encryption Domain JPEG在同态加密域的解压缩
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240672
Xiaojing Ma, Changming Liu, Sixing Cao, Bin B. Zhu
{"title":"JPEG Decompression in the Homomorphic Encryption Domain","authors":"Xiaojing Ma, Changming Liu, Sixing Cao, Bin B. Zhu","doi":"10.1145/3240508.3240672","DOIUrl":"https://doi.org/10.1145/3240508.3240672","url":null,"abstract":"Privacy-preserving processing is desirable for cloud computing to relieve users' concern of loss of control of their uploaded data. This may be fulfilled with homomorphic encryption. With widely used JPEG, it is desirable to enable JPEG decompression in the homomorphic encryption domain. This is a great challenge since JPEG decoding needs to determine a matched codeword, which then extracts a codeword-dependent number of coefficients. With no access to the information of encrypted content, a decoder does not know which codeword is matched, and thus cannot tell how many coefficients to extract, not to mention to compute their values. In this paper, we propose a novel scheme that enables JPEG decompression in the homomorphic encryption domain. The scheme applies a statically controlled iterative procedure to decode one coefficient per iteration. In one iteration, each codeword is compared with the bitstream to compute an encrypted Boolean that represents if the codeword is a match or not. Each codeword would produce an output coefficient and generate a new bitstream by dropping consumed bits as if it were a match. If a codeword is associated with more than one coefficient, the codeword is replaced with the codeword representing the remaining undecoded coefficients for the next decoding iteration. The summation of each codeword's output multiplied by its matching Boolean is the output of the current iteration. This is equivalent to selecting the output of a matched codeword. A side benefit of our statically controlled decoding procedure is that paralleled Single-Instruction Multiple-Data (SIMD) is fully supported, wherein multiple plaintexts are encrypted into a single plaintext, and decoding a ciphertext block corresponds to decoding all corresponding plaintext blocks. SIMD also reduces the total size of ciphertexts of an image. Experimental results are reported to show the performance of our proposed scheme.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132978071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
To Recognize Families In the Wild: A Machine Vision Tutorial 在野外识别家庭:机器视觉教程
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3241471
Joseph P. Robinson, Ming Shao, Y. Fu
{"title":"To Recognize Families In the Wild: A Machine Vision Tutorial","authors":"Joseph P. Robinson, Ming Shao, Y. Fu","doi":"10.1145/3240508.3241471","DOIUrl":"https://doi.org/10.1145/3240508.3241471","url":null,"abstract":"Automatic kinship recognition has relevance in an abundance of applications. For starters, aiding forensic investigations, as kinship is a powerful cue that could narrow the search space (e.g., knowledge that the 'Boston Bombers' were brothers could have helped identify the suspects sooner). In short, there are many beneficiaries that could result from such technologies: whether the consumer (e.g., automatic photo library management), scholar (e.g., historic lineage & genealogical studies), data analyzer (e.g., social-media- based analysis), investigator (e.g., cases of missing children and human trafficking. For instance, it is unlikely that a missing child found online would be in any database, however, more than likely a family member would be), or even refugees. Besides application- based problems, and as already hinted, kinship is a powerful cue that could serve as a face attribute capable of greatly reducing the search space in more general face-recognition problems. In this tutorial, we will introduce the background information, progress leading us up to these points, several current state-of-the-art algorithms spanning various views of the kinship recognition problem (e.g., verification, classification, tri-subject). We will then cover our large-scale Families In the Wild (FIW) image collection, several challenge competitions it as been used in, along with the top per- forming deep learning approaches. The tutorial will end with a discussion about future research directions and practical use-cases.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"200 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123022192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Connectionist Temporal Fusion for Sign Language Translation 手语翻译的联结主义时间融合
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240671
Shuo Wang, Dan Guo, Wen-gang Zhou, Zhengjun Zha, M. Wang
{"title":"Connectionist Temporal Fusion for Sign Language Translation","authors":"Shuo Wang, Dan Guo, Wen-gang Zhou, Zhengjun Zha, M. Wang","doi":"10.1145/3240508.3240671","DOIUrl":"https://doi.org/10.1145/3240508.3240671","url":null,"abstract":"Continuous sign language translation (CSLT) is a weakly supervised problem aiming at translating vision-based videos into natural languages under complicated sign linguistics, where the ordered words in a sentence label have no exact boundary of each sign action in the video. This paper proposes a hybrid deep architecture which consists of a temporal convolution module (TCOV), a bidirectional gated recurrent unit module (BGRU), and a fusion layer module (FL) to address the CSLT problem. TCOV captures short-term temporal transition on adjacent clip features (local pattern), while BGRU keeps the long-term context transition across temporal dimension (global pattern). FL concatenates the feature embedding of TCOV and BGRU to learn their complementary relationship (mutual pattern). Thus we propose a joint connectionist temporal fusion (CTF) mechanism to utilize the merit of each module. The proposed joint CTC loss optimization and deep classification score-based decoding fusion strategy are designed to boost performance. With only once training, our model under the CTC constraints achieves comparable performance to other existing methods with multiple EM iterations. Experiments are tested and verified on a benchmark, i.e. the RWTH-PHOENIX-Weather dataset, which demonstrate the effectiveness of our proposed method.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116887927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 74
Visual Spatial Attention Network for Relationship Detection 关系检测的视觉空间注意网络
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240611
Chaojun Han, Fumin Shen, Li Liu, Yang Yang, Heng Tao Shen
{"title":"Visual Spatial Attention Network for Relationship Detection","authors":"Chaojun Han, Fumin Shen, Li Liu, Yang Yang, Heng Tao Shen","doi":"10.1145/3240508.3240611","DOIUrl":"https://doi.org/10.1145/3240508.3240611","url":null,"abstract":"Visual relationship detection, which aims to predict a triplet with the detected objects, has attracted increasing attention in the scene understanding study. During tackling this problem, dealing with varying scales of the subjects and objects is of great importance, which has been less studied. To overcome this challenge, we propose a novel Vision Spatial Attention Network (VSA-Net), which employs a two-dimensional normal distribution attention scheme to effectively model small objects. In addition, we design a Subject-Object-layer (SO-layer) to distinguish between the subject and object to attain more precise results. To the best of our knowledge, VSA-Net is the first end-to-end attention mechanism based visual relationship detection model. Extensive experiments on the benchmark datasets (VRD and VG) show that, by using pure vision information, our VSA-Net achieves state-of-the-art performance for predicate detection, phrase detection, and relationship detection.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116994472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Session details: Multimodal-2 (Cross-Modal Translation) 会议详情:Multimodal-2(跨模态翻译)
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3286936
T. Yamasaki
{"title":"Session details: Multimodal-2 (Cross-Modal Translation)","authors":"T. Yamasaki","doi":"10.1145/3286936","DOIUrl":"https://doi.org/10.1145/3286936","url":null,"abstract":"","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114156057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SLIONS SLIONS
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240691
Dania Murad, Riwu Wang, D. Turnbull, Ye Wang
{"title":"SLIONS","authors":"Dania Murad, Riwu Wang, D. Turnbull, Ye Wang","doi":"10.1145/3240508.3240691","DOIUrl":"https://doi.org/10.1145/3240508.3240691","url":null,"abstract":"Singing songs can be an engaging and effective activity when learning a foreign language. In this paper, we describe a multi-language karaoke application called SLIONS: Singing and Listening to Improve Our Natural Speaking. When developing this application, we followed a user-centered design process which was informed by conducting interviews with domain experts, extensive usability testing, and reviewing existing gamified karaoke and language learning applications. The key feature of SLIONS is that we used automatic speech recognition (ASR) to provide students with personalized, granular feedback based on their singing pronunciation. We also provided multi-modal instruction: audio of music and singing tracks, video of a professional singer and translated text of lyrics to help students learn and master each song in the foreign language. To test the efficacy of SLIONS, we conducted a one-week pilot study with English and Chinese language learning students (N=15). The initial quantitative results show that our application can improve pronunciation and may improve vocabulary. In addition, the qualitative feedback from the students suggests that SLIONS is both fun to use and motivates students to practice speaking and singing in a foreign language.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114504066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Summary for AVEC 2018: Bipolar Disorder and Cross-Cultural Affect Recognition AVEC 2018总结:双相情感障碍和跨文化情感识别
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3243719
F. Ringeval, Björn Schuller, M. Valstar, R. Cowie, M. Pantic
{"title":"Summary for AVEC 2018: Bipolar Disorder and Cross-Cultural Affect Recognition","authors":"F. Ringeval, Björn Schuller, M. Valstar, R. Cowie, M. Pantic","doi":"10.1145/3240508.3243719","DOIUrl":"https://doi.org/10.1145/3240508.3243719","url":null,"abstract":"The eighth Audio-Visual Emotion Challenge and workshop AVEC 2018 was held in conjunction with ACM Multimedia'18. This year, the AVEC series addressed major novelties with three distinct sub-challenges: bipolar disorder classification, cross-cultural dimensional emotion recognition, and emotional label generation from individual ratings. The Bipolar Disorder Sub-challenge was based on a novel dataset of structured interviews of patients suffering from bipolar disorder (BD corpus), the Cross-cultural Emotion Sub-challenge relied on an extension of the SEWA dataset, which includes human-human interactions recorded 'in-the-wild' for the German and the Hungarian cultures, and the Gold-standard Emotion Sub-challenge was based on the RECOLA dataset, which was previously used in the AVEC series for emotion recognition. In this summary, we mainly describe participation and conditions of the AVEC Challenge.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115223389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信