{"title":"Joint Estimation of Age and Gender from Unconstrained Face Images Using Lightweight Multi-Task CNN for Mobile Applications","authors":"Jia-Hong Lee, Yi-Ming Chan, Ting-Yen Chen, Chu-Song Chen","doi":"10.1109/MIPR.2018.00036","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00036","url":null,"abstract":"Automatic age and gender classification based on unconstrained images has become essential techniques on mobile devices. With limited computing power, how to develop a robust system becomes a challenging task. In this paper, we present an efficient convolutional neural network (CNN) called lightweight multi-task CNN for simultaneous age and gender classification. Lightweight multi-task CNN uses depthwise separable convolution to reduce the model size and save the inference time. On the public challenging Adience dataset, the accuracy of age and gender classification is better than baseline multi-task CNN methods.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134217090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mayank Meghawat, Satyendra Yadav, Debanjan Mahata, Yifang Yin, R. Shah, Roger Zimmermann
{"title":"A Multimodal Approach to Predict Social Media Popularity","authors":"Mayank Meghawat, Satyendra Yadav, Debanjan Mahata, Yifang Yin, R. Shah, Roger Zimmermann","doi":"10.1109/MIPR.2018.00042","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00042","url":null,"abstract":"Multiple modalities represent different aspects by which information is conveyed by a data source. Modern day social media platforms are one of the primary sources of multimodal data, where users use different modes of expression by posting textual as well as multimedia content such as images and videos for sharing information. Multimodal information embedded in such posts could be useful in predicting their popularity. To the best of our knowledge, no such multimodal dataset exists for the prediction of social media photos. In this work, we propose a multimodal dataset consisiting of content, context, and social information for popularity prediction. Speci?cally, we augment the SMPT1 dataset for social media prediction in ACM Multimedia grand challenge 2017 with image content, titles, descriptions, and tags. Next, in this paper, we propose a multimodal approach which exploits visual features (i.e., content information), textual features (i.e., contextual information), and social features (e.g., average views and group counts) to predict popularity of social media photos in terms of view counts. Experimental results con?rm that despite our multimodal approach uses the half of the training dataset from SMP-T1, it achieves comparable performance with that of state-of-the-art.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126775300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Approach of Multiple Objects Segmentation Based on Graph Cut","authors":"Jiyang Dong, Jian Xue, Shuqiang Jiang, K. Lu","doi":"10.1109/MIPR.2018.00074","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00074","url":null,"abstract":"Segmentation is a very crucial step in many applications. Actually, there are often more than one object to be segmented in an image or a video. Taking the lung images as an example, pulmonary lesions area and lung parenchyma area are both important basis for a doctor to make diagnoses. Due to the fact that lung lesion areas and lung tissues have close gray values in the image, and the diversity, irregularity and location uncertainty of pulmonary lesions, traditional segmentation methods cannot segment objects of interest accurately, nor can extract them at the same time. In this paper, a novel approach is proposed for multiple objects segmentation based on Graph Cut. The algorithm introduces a multi-layers graph structure to represent different regions from inside to outside in an image. Besides, the foreground and background are modeled by Gaussian Mixture Models (GMMs) which can describe the gray distributions of them accurately. Then the weights of parts of links in the graph can be calculated by the probability distribution of the models. To solve the problem of boundaries leakage when two objects with similar gray value are in close proximity, a shape constraint is added to the energy function. The segmentation is achieved by max-flow/min-cut and all of the objects can be obtained. Experiment results demonstrate that the proposed method in this paper can deal with the CT images of lung with pathologies, and has accuracy and robustness.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124275094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haixi Zhang, G. He, Jinye Peng, Zhenzhong Kuang, Jianping Fan
{"title":"Deep Learning of Path-Based Tree Classifiers for Large-Scale Plant Species Identification","authors":"Haixi Zhang, G. He, Jinye Peng, Zhenzhong Kuang, Jianping Fan","doi":"10.1109/MIPR.2018.00013","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00013","url":null,"abstract":"In this paper, a deep learning framework is devel- oped to enable path-based tree classifier training for supporting large-scale plant species recognition, where a deep neural network and a tree classifier are jointly trained in an end-to-end fashion. First, a two-layer plant taxonomy is constructed to organize large numbers of plant species and their genus hierarchically in a coarse- to-fine fashion. Second, a deep learning framework is developed to enable path-based tree classifier training, where a tree classifier over the plant taxonomy is used to replace the flat softmax layer in traditional deep CNNs. A path-based error function is defined to optimize the joint process for learning deep CNN and tree classifier, where back propagation is used to update both the classifier parameters and the network weights simultaneously. We have also constructed a large-scale plant database of Orchid family for algorithm evaluation. Our experimental results have demonstrated that our path-based deep learning algorithm can achieve very competitive results on both the accuracy rates and the computational efficiency for large-scale plant species recognition.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115015080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Devamanyu Hazarika, Sruthi Gorantla, Soujanya Poria, Roger Zimmermann
{"title":"Self-Attentive Feature-Level Fusion for Multimodal Emotion Detection","authors":"Devamanyu Hazarika, Sruthi Gorantla, Soujanya Poria, Roger Zimmermann","doi":"10.1109/MIPR.2018.00043","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00043","url":null,"abstract":"Multimodal emotion recognition is the task of detecting emotions present in user-generated multimedia content. Such resources contain complementary information in multiple modalities. A stiff challenge often faced is the complexity associated with feature-level fusion of these heterogeneous modes. In this paper, we propose a new feature-level fusion method based on self-attention mechanism. We also compare it with traditional fusion methods such as concatenation, outer-product, etc. Analyzed using textual and speech (audio) modalities, our results suggest that the proposed fusion method outperforms others in the context of utterance-level emotion recognition in videos.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129145675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenhui Gao, Xinchen Liu, Huadong Ma, Yanan Li, Liang Liu
{"title":"MMH: Multi-Modal Hash for Instant Mobile Video Search","authors":"Wenhui Gao, Xinchen Liu, Huadong Ma, Yanan Li, Liang Liu","doi":"10.1109/MIPR.2018.00018","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00018","url":null,"abstract":"Mobile devices have been an indispensable part of human life, which enable people to search and browse what they want on the move. Mobile video search, as one of the most important services for users, still faces great challenges under mobile internet scenario, such as the limitation of computation ability, memory, and bandwidth. Therefore, this paper proposes a multi-modal hash based framework for instant mobile video search. In particular, we adopt a efficient deep convolutional neural network, MobileNet, with the hash layer to learn discriminative and compact visual features from videos. Moreover, we also consider hand-crafted local visual descriptor and audio fingerprint to build a multi-modal hash representation of videos. With the multi-modal hash code, two types of hash indexes are built on the server to achieve efficient video search. At last, the multi-modal hash codes are extracted on the mobile devices and transferred in a three- step progressive procedure during the online search stage. The experiments on the real-world dataset show that the proposed framework not only achieves the state-of-the-art accuracy but also obtains excellent efficiency.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126492861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast Vision-Based Pedestrian Traffic Light Detection","authors":"Xue-Hua Wu, R. Hu, Yu‐Qing Bao","doi":"10.1109/MIPR.2018.00050","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00050","url":null,"abstract":"Detection of pedestrian traffic light is very important for the visually impaired. However, fast but accurate vision-based detection is not an easy task due to the complexity of background and illumination. In this paper, a fast vision-based detection system is designed. In the designed system, the background filter is applied to identify the candidate regions of pedestrian traffic lights. And the cascade classifier obtained by the Adaboost algorithm based on the multi-layer features is used to detect the pedestrian traffic lights. Testing results verifies the effectiveness of the designed system.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130856743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Hammoodi, Fadwa Al-Azzo, M. Milanova, H. Khaleel
{"title":"Bayesian Regularization Based ANN for the Design of Flexible Antenna for UWB Wireless Applications","authors":"A. Hammoodi, Fadwa Al-Azzo, M. Milanova, H. Khaleel","doi":"10.1109/MIPR.2018.00039","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00039","url":null,"abstract":"This paper presents a flexible pentagonal shape Ultra-Wide Band (UWB) antenna design using Artificial Neural Network (ANN) for WLAN, 5G, and WiMAX applications. The pentagonal patch is placed on top of flexible polyimide substrate and simulated using the well-known 3-D electromagnetic (EM) simulator HFSS, v.18.1. Due to large computing cluster required by the EM simulator to solve the design under consideration in addition to the time consumed, ANN is used to synthesize the design and reduce the cost and time consumed to analyze the aforementioned structure. Neural Network with 1 hidden layer of 10 neurons based on Bayesian Regularization algorithm is presented. An error of less 5% is produced during the learning, validation, and testing processes. Neural network is a good candidate to represent the pentagonal shape antenna used for UWB applications.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"37 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131686634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Beyond Big Data of Human Behaviors: Modeling Human Behaviors and Deep Emotions","authors":"James J. Deng, C. Leung, Yuanxi Li","doi":"10.1109/MIPR.2018.00065","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00065","url":null,"abstract":"Humans possess a variety of long term or short term behaviors such as gesture, posture, and movement and so on. These readable behaviors usually convey significant emotional information, which can facilitate human-machine interactions in intelligent cognitive systems. However, there is a lack of studies on modeling such complex relationship between human behavior and emotion in a time series context. This paper attempts to pioneer such an exploration. First, huge amounts of human behaviors are suggested to be captured by various sensors. Then behaviors and emotions are modeled by deep structure of bidirectional LSTM, which can represent interactions and correlations. To avoid training difficulties, bidirectional LSTM are only located in the bottom layer, and the other layers are uni-bidirectional, while the adjacent layers use residual connections. This deep bidirectional LSTM has the advantage that it can be scaled up to larger varieties of human behaviors captured by multiple sensors. The experimental results show that our proposed deep structure for modeling human behaviors and emotions is able to achieve a high degree of accuracy than shallow representation or models.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124049105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Subjective Evaluation of Vector Representation of Emotion Flow for Music Retrieval","authors":"Chia-Hao Chung, Ming-I Yang, Homer H. Chen","doi":"10.1109/MIPR.2018.00075","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00075","url":null,"abstract":"Because it simply consists of an initial point and a terminal point in a two dimensional emotion plane, vector representation of music emotion provides an intuitive and instant visualization of the dynamics of music emotion. In this paper, we investigate the performance of this representation for music information retrieval by conducting a series of subjective tests. A music retrieval system is created, and the user experience data are evaluated by seven metrics: learnability, ease of use, affordance, usefulness, joyfulness, novelty, and overall satisfaction. Compared with the point representation, the vector representation performs relatively better in affordance, novelty, and joyfulness but slightly worse in learnability and ease of use. The overall satisfaction score is 5.19 for the point representation and 5.43 for the vector representation. The results suggest that each representation has its own strengths, and the choice between the two representations depends on which metrics carry more weight in an application at hand.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127151166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}