Proceedings of the 26th ACM international conference on Multimedia最新文献_第8页

Historical Context-based Style Classification of Painting Images via Label Distribution Learning 基于历史语境的标签分布学习绘画图像风格分类

Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240593

Jufeng Yang, Liyi Chen, Le Zhang, Xiaoxiao Sun, Dongyu She, Shao-Ping Lu, Ming-Ming Cheng

{"title":"Historical Context-based Style Classification of Painting Images via Label Distribution Learning","authors":"Jufeng Yang, Liyi Chen, Le Zhang, Xiaoxiao Sun, Dongyu She, Shao-Ping Lu, Ming-Ming Cheng","doi":"10.1145/3240508.3240593","DOIUrl":"https://doi.org/10.1145/3240508.3240593","url":null,"abstract":"Analyzing and categorizing the style of visual art images, especially paintings, is gaining popularity owing to its importance in understanding and appreciating the art. The evolution of painting style is both continuous, in a sense that new styles may inherit, develop or even mutate from their predecessors and multi-modal because of various issues such as the visual appearance, the birthplace, the origin time and the art movement. Motivated by this peculiarity, we introduce a novel knowledge distilling strategy to assist visual feature learning in the convolutional neural network for painting style classification. More specifically, a multi-factor distribution is employed as soft-labels to distill complementary information with visual input, which extracts from different historical context via label distribution learning. The proposed method is well-encapsulated in a multi-task learning framework which allows end-to-end training. We demonstrate the superiority of the proposed method over the state-of-the-art approaches on Painting91, OilPainting, and Pandora datasets.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125681871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Dest-ResNet: A Deep Spatiotemporal Residual Network for Hotspot Traffic Speed Prediction est- resnet:用于热点流量速度预测的深度时空残差网络

Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240656

Binbing Liao, Jingqing Zhang, Ming Cai, Siliang Tang, Yifan Gao, Chao Wu, Shengwen Yang, Wenwu Zhu, Yike Guo, Fei Wu

{"title":"Dest-ResNet: A Deep Spatiotemporal Residual Network for Hotspot Traffic Speed Prediction","authors":"Binbing Liao, Jingqing Zhang, Ming Cai, Siliang Tang, Yifan Gao, Chao Wu, Shengwen Yang, Wenwu Zhu, Yike Guo, Fei Wu","doi":"10.1145/3240508.3240656","DOIUrl":"https://doi.org/10.1145/3240508.3240656","url":null,"abstract":"With the ever-increasing urbanization process, the traffic jam has become a common problem in the metropolises around the world, making the traffic speed prediction a crucial and fundamental task. This task is difficult due to the dynamic and intrinsic complexity of the traffic environment in urban cities, yet the emergence of crowd map query data sheds new light on it. In general, a burst of crowd map queries for the same destination in a short duration (called \"hotspot'') could lead to traffic congestion. For example, queries of the Capital Gym burst on weekend evenings lead to traffic jams around the gym. However, unleashing the power of crowd map queries is challenging due to the innate spatiotemporal characteristics of the crowd queries. To bridge the gap, this paper firstly discovers hotspots underlying crowd map queries. These discovered hotspots address the spatiotemporal variations. Then Dest-ResNet (Deep spatiotemporal Residual Network) is proposed for hotspot traffic speed prediction. Dest-ResNet is a sequence learning framework that jointly deals with two sequences in different modalities, i.e., the traffic speed sequence and the query sequence. The main idea of Dest-ResNet is to learn to explain and amend the errors caused when the unimodal information is applied individually. In this way, Dest-ResNet addresses the temporal causal correlation between queries and the traffic speed. As a result, Dest-ResNet shows a 30% relative boost over the state-of-the-art methods on real-world datasets from Baidu Map.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126813619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Inferring User Emotive State Changes in Realistic Human-Computer Conversational Dialogs 推断现实人机对话中用户情绪状态的变化

Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240575

Runnan Li, Zhiyong Wu, Jia Jia, Jingbei Li, Wei Chen, H. Meng

{"title":"Inferring User Emotive State Changes in Realistic Human-Computer Conversational Dialogs","authors":"Runnan Li, Zhiyong Wu, Jia Jia, Jingbei Li, Wei Chen, H. Meng","doi":"10.1145/3240508.3240575","DOIUrl":"https://doi.org/10.1145/3240508.3240575","url":null,"abstract":"Human-computer conversational interactions are increasingly pervasive in real-world applications, such as chatbots and virtual assistants. The user experience can be enhanced through affective design of such conversational dialogs, especially in enabling the computer to understand the emotive state in the user's input, and to generate an appropriate system response within the dialog turn. Such a system response may further influence the user's emotive state in the subsequent dialog turn. In this paper, we focus on the change in the user's emotive states in adjacent dialog turns, to which we refer as user emotive state change. We propose a multi-modal, multi-task deep learning framework to infer the user's emotive states and emotive state changes simultaneously. Multi-task learning convolution fusion auto-encoder is applied to fuse the acoustic and textual features to generate a robust representation of the user's input. Long-short term memory recurrent auto-encoder is employed to extract features of system responses at the sentence-level to better capture factors affecting user emotive states. Multi-task learned structured output layer is adopted to model the dependency of user emotive state change, conditioned upon the user input's emotive states and system response in current dialog turn. Experimental results demonstrate the effectiveness of the proposed method.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128107137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

JPEG Decompression in the Homomorphic Encryption Domain JPEG在同态加密域的解压缩

Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240672

Xiaojing Ma, Changming Liu, Sixing Cao, Bin B. Zhu

{"title":"JPEG Decompression in the Homomorphic Encryption Domain","authors":"Xiaojing Ma, Changming Liu, Sixing Cao, Bin B. Zhu","doi":"10.1145/3240508.3240672","DOIUrl":"https://doi.org/10.1145/3240508.3240672","url":null,"abstract":"Privacy-preserving processing is desirable for cloud computing to relieve users' concern of loss of control of their uploaded data. This may be fulfilled with homomorphic encryption. With widely used JPEG, it is desirable to enable JPEG decompression in the homomorphic encryption domain. This is a great challenge since JPEG decoding needs to determine a matched codeword, which then extracts a codeword-dependent number of coefficients. With no access to the information of encrypted content, a decoder does not know which codeword is matched, and thus cannot tell how many coefficients to extract, not to mention to compute their values. In this paper, we propose a novel scheme that enables JPEG decompression in the homomorphic encryption domain. The scheme applies a statically controlled iterative procedure to decode one coefficient per iteration. In one iteration, each codeword is compared with the bitstream to compute an encrypted Boolean that represents if the codeword is a match or not. Each codeword would produce an output coefficient and generate a new bitstream by dropping consumed bits as if it were a match. If a codeword is associated with more than one coefficient, the codeword is replaced with the codeword representing the remaining undecoded coefficients for the next decoding iteration. The summation of each codeword's output multiplied by its matching Boolean is the output of the current iteration. This is equivalent to selecting the output of a matched codeword. A side benefit of our statically controlled decoding procedure is that paralleled Single-Instruction Multiple-Data (SIMD) is fully supported, wherein multiple plaintexts are encrypted into a single plaintext, and decoding a ciphertext block corresponds to decoding all corresponding plaintext blocks. SIMD also reduces the total size of ciphertexts of an image. Experimental results are reported to show the performance of our proposed scheme.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132978071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

To Recognize Families In the Wild: A Machine Vision Tutorial 在野外识别家庭:机器视觉教程

Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3241471

Joseph P. Robinson, Ming Shao, Y. Fu

{"title":"To Recognize Families In the Wild: A Machine Vision Tutorial","authors":"Joseph P. Robinson, Ming Shao, Y. Fu","doi":"10.1145/3240508.3241471","DOIUrl":"https://doi.org/10.1145/3240508.3241471","url":null,"abstract":"Automatic kinship recognition has relevance in an abundance of applications. For starters, aiding forensic investigations, as kinship is a powerful cue that could narrow the search space (e.g., knowledge that the 'Boston Bombers' were brothers could have helped identify the suspects sooner). In short, there are many beneficiaries that could result from such technologies: whether the consumer (e.g., automatic photo library management), scholar (e.g., historic lineage & genealogical studies), data analyzer (e.g., social-media- based analysis), investigator (e.g., cases of missing children and human trafficking. For instance, it is unlikely that a missing child found online would be in any database, however, more than likely a family member would be), or even refugees. Besides application- based problems, and as already hinted, kinship is a powerful cue that could serve as a face attribute capable of greatly reducing the search space in more general face-recognition problems. In this tutorial, we will introduce the background information, progress leading us up to these points, several current state-of-the-art algorithms spanning various views of the kinship recognition problem (e.g., verification, classification, tri-subject). We will then cover our large-scale Families In the Wild (FIW) image collection, several challenge competitions it as been used in, along with the top per- forming deep learning approaches. The tutorial will end with a discussion about future research directions and practical use-cases.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"200 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123022192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Connectionist Temporal Fusion for Sign Language Translation 手语翻译的联结主义时间融合

Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240671

Shuo Wang, Dan Guo, Wen-gang Zhou, Zhengjun Zha, M. Wang

{"title":"Connectionist Temporal Fusion for Sign Language Translation","authors":"Shuo Wang, Dan Guo, Wen-gang Zhou, Zhengjun Zha, M. Wang","doi":"10.1145/3240508.3240671","DOIUrl":"https://doi.org/10.1145/3240508.3240671","url":null,"abstract":"Continuous sign language translation (CSLT) is a weakly supervised problem aiming at translating vision-based videos into natural languages under complicated sign linguistics, where the ordered words in a sentence label have no exact boundary of each sign action in the video. This paper proposes a hybrid deep architecture which consists of a temporal convolution module (TCOV), a bidirectional gated recurrent unit module (BGRU), and a fusion layer module (FL) to address the CSLT problem. TCOV captures short-term temporal transition on adjacent clip features (local pattern), while BGRU keeps the long-term context transition across temporal dimension (global pattern). FL concatenates the feature embedding of TCOV and BGRU to learn their complementary relationship (mutual pattern). Thus we propose a joint connectionist temporal fusion (CTF) mechanism to utilize the merit of each module. The proposed joint CTC loss optimization and deep classification score-based decoding fusion strategy are designed to boost performance. With only once training, our model under the CTC constraints achieves comparable performance to other existing methods with multiple EM iterations. Experiments are tested and verified on a benchmark, i.e. the RWTH-PHOENIX-Weather dataset, which demonstrate the effectiveness of our proposed method.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116887927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 74

Visual Spatial Attention Network for Relationship Detection 关系检测的视觉空间注意网络

Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240611

Chaojun Han, Fumin Shen, Li Liu, Yang Yang, Heng Tao Shen

引用次数: 29

Session details: Multimodal-2 (Cross-Modal Translation) 会议详情:Multimodal-2(跨模态翻译)

Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3286936

T. Yamasaki

引用次数: 0

SLIONS SLIONS

Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240691

Dania Murad, Riwu Wang, D. Turnbull, Ye Wang

{"title":"SLIONS","authors":"Dania Murad, Riwu Wang, D. Turnbull, Ye Wang","doi":"10.1145/3240508.3240691","DOIUrl":"https://doi.org/10.1145/3240508.3240691","url":null,"abstract":"Singing songs can be an engaging and effective activity when learning a foreign language. In this paper, we describe a multi-language karaoke application called SLIONS: Singing and Listening to Improve Our Natural Speaking. When developing this application, we followed a user-centered design process which was informed by conducting interviews with domain experts, extensive usability testing, and reviewing existing gamified karaoke and language learning applications. The key feature of SLIONS is that we used automatic speech recognition (ASR) to provide students with personalized, granular feedback based on their singing pronunciation. We also provided multi-modal instruction: audio of music and singing tracks, video of a professional singer and translated text of lyrics to help students learn and master each song in the foreign language. To test the efficacy of SLIONS, we conducted a one-week pilot study with English and Chinese language learning students (N=15). The initial quantitative results show that our application can improve pronunciation and may improve vocabulary. In addition, the qualitative feedback from the students suggests that SLIONS is both fun to use and motivates students to practice speaking and singing in a foreign language.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114504066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Summary for AVEC 2018: Bipolar Disorder and Cross-Cultural Affect Recognition AVEC 2018总结:双相情感障碍和跨文化情感识别

Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3243719

F. Ringeval, Björn Schuller, M. Valstar, R. Cowie, M. Pantic

引用次数: 6