2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)最新文献

筛选
英文 中文
Joint Estimation of Age and Gender from Unconstrained Face Images Using Lightweight Multi-Task CNN for Mobile Applications 基于轻量级多任务CNN的无约束人脸图像年龄和性别联合估计
2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) Pub Date : 2018-06-06 DOI: 10.1109/MIPR.2018.00036
Jia-Hong Lee, Yi-Ming Chan, Ting-Yen Chen, Chu-Song Chen
{"title":"Joint Estimation of Age and Gender from Unconstrained Face Images Using Lightweight Multi-Task CNN for Mobile Applications","authors":"Jia-Hong Lee, Yi-Ming Chan, Ting-Yen Chen, Chu-Song Chen","doi":"10.1109/MIPR.2018.00036","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00036","url":null,"abstract":"Automatic age and gender classification based on unconstrained images has become essential techniques on mobile devices. With limited computing power, how to develop a robust system becomes a challenging task. In this paper, we present an efficient convolutional neural network (CNN) called lightweight multi-task CNN for simultaneous age and gender classification. Lightweight multi-task CNN uses depthwise separable convolution to reduce the model size and save the inference time. On the public challenging Adience dataset, the accuracy of age and gender classification is better than baseline multi-task CNN methods.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134217090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
A Multimodal Approach to Predict Social Media Popularity 预测社交媒体流行度的多模式方法
2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) Pub Date : 2018-05-30 DOI: 10.1109/MIPR.2018.00042
Mayank Meghawat, Satyendra Yadav, Debanjan Mahata, Yifang Yin, R. Shah, Roger Zimmermann
{"title":"A Multimodal Approach to Predict Social Media Popularity","authors":"Mayank Meghawat, Satyendra Yadav, Debanjan Mahata, Yifang Yin, R. Shah, Roger Zimmermann","doi":"10.1109/MIPR.2018.00042","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00042","url":null,"abstract":"Multiple modalities represent different aspects by which information is conveyed by a data source. Modern day social media platforms are one of the primary sources of multimodal data, where users use different modes of expression by posting textual as well as multimedia content such as images and videos for sharing information. Multimodal information embedded in such posts could be useful in predicting their popularity. To the best of our knowledge, no such multimodal dataset exists for the prediction of social media photos. In this work, we propose a multimodal dataset consisiting of content, context, and social information for popularity prediction. Speci?cally, we augment the SMPT1 dataset for social media prediction in ACM Multimedia grand challenge 2017 with image content, titles, descriptions, and tags. Next, in this paper, we propose a multimodal approach which exploits visual features (i.e., content information), textual features (i.e., contextual information), and social features (e.g., average views and group counts) to predict popularity of social media photos in terms of view counts. Experimental results con?rm that despite our multimodal approach uses the half of the training dataset from SMP-T1, it achieves comparable performance with that of state-of-the-art.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126775300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
A Novel Approach of Multiple Objects Segmentation Based on Graph Cut 一种基于图割的多目标分割新方法
2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) Pub Date : 2018-04-10 DOI: 10.1109/MIPR.2018.00074
Jiyang Dong, Jian Xue, Shuqiang Jiang, K. Lu
{"title":"A Novel Approach of Multiple Objects Segmentation Based on Graph Cut","authors":"Jiyang Dong, Jian Xue, Shuqiang Jiang, K. Lu","doi":"10.1109/MIPR.2018.00074","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00074","url":null,"abstract":"Segmentation is a very crucial step in many applications. Actually, there are often more than one object to be segmented in an image or a video. Taking the lung images as an example, pulmonary lesions area and lung parenchyma area are both important basis for a doctor to make diagnoses. Due to the fact that lung lesion areas and lung tissues have close gray values in the image, and the diversity, irregularity and location uncertainty of pulmonary lesions, traditional segmentation methods cannot segment objects of interest accurately, nor can extract them at the same time. In this paper, a novel approach is proposed for multiple objects segmentation based on Graph Cut. The algorithm introduces a multi-layers graph structure to represent different regions from inside to outside in an image. Besides, the foreground and background are modeled by Gaussian Mixture Models (GMMs) which can describe the gray distributions of them accurately. Then the weights of parts of links in the graph can be calculated by the probability distribution of the models. To solve the problem of boundaries leakage when two objects with similar gray value are in close proximity, a shape constraint is added to the energy function. The segmentation is achieved by max-flow/min-cut and all of the objects can be obtained. Experiment results demonstrate that the proposed method in this paper can deal with the CT images of lung with pathologies, and has accuracy and robustness.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124275094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Deep Learning of Path-Based Tree Classifiers for Large-Scale Plant Species Identification 基于路径的树分类器深度学习的大规模植物物种识别
2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) Pub Date : 2018-04-10 DOI: 10.1109/MIPR.2018.00013
Haixi Zhang, G. He, Jinye Peng, Zhenzhong Kuang, Jianping Fan
{"title":"Deep Learning of Path-Based Tree Classifiers for Large-Scale Plant Species Identification","authors":"Haixi Zhang, G. He, Jinye Peng, Zhenzhong Kuang, Jianping Fan","doi":"10.1109/MIPR.2018.00013","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00013","url":null,"abstract":"In this paper, a deep learning framework is devel- oped to enable path-based tree classifier training for supporting large-scale plant species recognition, where a deep neural network and a tree classifier are jointly trained in an end-to-end fashion. First, a two-layer plant taxonomy is constructed to organize large numbers of plant species and their genus hierarchically in a coarse- to-fine fashion. Second, a deep learning framework is developed to enable path-based tree classifier training, where a tree classifier over the plant taxonomy is used to replace the flat softmax layer in traditional deep CNNs. A path-based error function is defined to optimize the joint process for learning deep CNN and tree classifier, where back propagation is used to update both the classifier parameters and the network weights simultaneously. We have also constructed a large-scale plant database of Orchid family for algorithm evaluation. Our experimental results have demonstrated that our path-based deep learning algorithm can achieve very competitive results on both the accuracy rates and the computational efficiency for large-scale plant species recognition.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115015080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Self-Attentive Feature-Level Fusion for Multimodal Emotion Detection 多模态情感检测的自关注特征级融合
2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) Pub Date : 2018-04-10 DOI: 10.1109/MIPR.2018.00043
Devamanyu Hazarika, Sruthi Gorantla, Soujanya Poria, Roger Zimmermann
{"title":"Self-Attentive Feature-Level Fusion for Multimodal Emotion Detection","authors":"Devamanyu Hazarika, Sruthi Gorantla, Soujanya Poria, Roger Zimmermann","doi":"10.1109/MIPR.2018.00043","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00043","url":null,"abstract":"Multimodal emotion recognition is the task of detecting emotions present in user-generated multimedia content. Such resources contain complementary information in multiple modalities. A stiff challenge often faced is the complexity associated with feature-level fusion of these heterogeneous modes. In this paper, we propose a new feature-level fusion method based on self-attention mechanism. We also compare it with traditional fusion methods such as concatenation, outer-product, etc. Analyzed using textual and speech (audio) modalities, our results suggest that the proposed fusion method outperforms others in the context of utterance-level emotion recognition in videos.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129145675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
MMH: Multi-Modal Hash for Instant Mobile Video Search 即时移动视频搜索的多模态哈希
2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) Pub Date : 2018-04-10 DOI: 10.1109/MIPR.2018.00018
Wenhui Gao, Xinchen Liu, Huadong Ma, Yanan Li, Liang Liu
{"title":"MMH: Multi-Modal Hash for Instant Mobile Video Search","authors":"Wenhui Gao, Xinchen Liu, Huadong Ma, Yanan Li, Liang Liu","doi":"10.1109/MIPR.2018.00018","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00018","url":null,"abstract":"Mobile devices have been an indispensable part of human life, which enable people to search and browse what they want on the move. Mobile video search, as one of the most important services for users, still faces great challenges under mobile internet scenario, such as the limitation of computation ability, memory, and bandwidth. Therefore, this paper proposes a multi-modal hash based framework for instant mobile video search. In particular, we adopt a efficient deep convolutional neural network, MobileNet, with the hash layer to learn discriminative and compact visual features from videos. Moreover, we also consider hand-crafted local visual descriptor and audio fingerprint to build a multi-modal hash representation of videos. With the multi-modal hash code, two types of hash indexes are built on the server to achieve efficient video search. At last, the multi-modal hash codes are extracted on the mobile devices and transferred in a three- step progressive procedure during the online search stage. The experiments on the real-world dataset show that the proposed framework not only achieves the state-of-the-art accuracy but also obtains excellent efficiency.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126492861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast Vision-Based Pedestrian Traffic Light Detection 基于快速视觉的行人交通信号灯检测
2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) Pub Date : 2018-04-10 DOI: 10.1109/MIPR.2018.00050
Xue-Hua Wu, R. Hu, Yu‐Qing Bao
{"title":"Fast Vision-Based Pedestrian Traffic Light Detection","authors":"Xue-Hua Wu, R. Hu, Yu‐Qing Bao","doi":"10.1109/MIPR.2018.00050","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00050","url":null,"abstract":"Detection of pedestrian traffic light is very important for the visually impaired. However, fast but accurate vision-based detection is not an easy task due to the complexity of background and illumination. In this paper, a fast vision-based detection system is designed. In the designed system, the background filter is applied to identify the candidate regions of pedestrian traffic lights. And the cascade classifier obtained by the Adaboost algorithm based on the multi-layer features is used to detect the pedestrian traffic lights. Testing results verifies the effectiveness of the designed system.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130856743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Bayesian Regularization Based ANN for the Design of Flexible Antenna for UWB Wireless Applications 基于贝叶斯正则化的超宽带无线柔性天线设计
2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) Pub Date : 2018-04-10 DOI: 10.1109/MIPR.2018.00039
A. Hammoodi, Fadwa Al-Azzo, M. Milanova, H. Khaleel
{"title":"Bayesian Regularization Based ANN for the Design of Flexible Antenna for UWB Wireless Applications","authors":"A. Hammoodi, Fadwa Al-Azzo, M. Milanova, H. Khaleel","doi":"10.1109/MIPR.2018.00039","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00039","url":null,"abstract":"This paper presents a flexible pentagonal shape Ultra-Wide Band (UWB) antenna design using Artificial Neural Network (ANN) for WLAN, 5G, and WiMAX applications. The pentagonal patch is placed on top of flexible polyimide substrate and simulated using the well-known 3-D electromagnetic (EM) simulator HFSS, v.18.1. Due to large computing cluster required by the EM simulator to solve the design under consideration in addition to the time consumed, ANN is used to synthesize the design and reduce the cost and time consumed to analyze the aforementioned structure. Neural Network with 1 hidden layer of 10 neurons based on Bayesian Regularization algorithm is presented. An error of less 5% is produced during the learning, validation, and testing processes. Neural network is a good candidate to represent the pentagonal shape antenna used for UWB applications.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"37 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131686634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Beyond Big Data of Human Behaviors: Modeling Human Behaviors and Deep Emotions 超越人类行为的大数据:模拟人类行为和深层情感
2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) Pub Date : 2018-04-10 DOI: 10.1109/MIPR.2018.00065
James J. Deng, C. Leung, Yuanxi Li
{"title":"Beyond Big Data of Human Behaviors: Modeling Human Behaviors and Deep Emotions","authors":"James J. Deng, C. Leung, Yuanxi Li","doi":"10.1109/MIPR.2018.00065","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00065","url":null,"abstract":"Humans possess a variety of long term or short term behaviors such as gesture, posture, and movement and so on. These readable behaviors usually convey significant emotional information, which can facilitate human-machine interactions in intelligent cognitive systems. However, there is a lack of studies on modeling such complex relationship between human behavior and emotion in a time series context. This paper attempts to pioneer such an exploration. First, huge amounts of human behaviors are suggested to be captured by various sensors. Then behaviors and emotions are modeled by deep structure of bidirectional LSTM, which can represent interactions and correlations. To avoid training difficulties, bidirectional LSTM are only located in the bottom layer, and the other layers are uni-bidirectional, while the adjacent layers use residual connections. This deep bidirectional LSTM has the advantage that it can be scaled up to larger varieties of human behaviors captured by multiple sensors. The experimental results show that our proposed deep structure for modeling human behaviors and emotions is able to achieve a high degree of accuracy than shallow representation or models.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124049105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Subjective Evaluation of Vector Representation of Emotion Flow for Music Retrieval 情感流向量表示在音乐检索中的主观评价
2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) Pub Date : 2018-04-10 DOI: 10.1109/MIPR.2018.00075
Chia-Hao Chung, Ming-I Yang, Homer H. Chen
{"title":"Subjective Evaluation of Vector Representation of Emotion Flow for Music Retrieval","authors":"Chia-Hao Chung, Ming-I Yang, Homer H. Chen","doi":"10.1109/MIPR.2018.00075","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00075","url":null,"abstract":"Because it simply consists of an initial point and a terminal point in a two dimensional emotion plane, vector representation of music emotion provides an intuitive and instant visualization of the dynamics of music emotion. In this paper, we investigate the performance of this representation for music information retrieval by conducting a series of subjective tests. A music retrieval system is created, and the user experience data are evaluated by seven metrics: learnability, ease of use, affordance, usefulness, joyfulness, novelty, and overall satisfaction. Compared with the point representation, the vector representation performs relatively better in affordance, novelty, and joyfulness but slightly worse in learnability and ease of use. The overall satisfaction score is 5.19 for the point representation and 5.43 for the vector representation. The results suggest that each representation has its own strengths, and the choice between the two representations depends on which metrics carry more weight in an application at hand.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127151166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信