Xiaoxue Zang, T. Yamasaki, K. Aizawa, Tetsuhiro Nakamoto, E. Kuwabara, Shinichi Egami, Yusuke Fuchida
{"title":"How competitive are you: Analysis of people's attractiveness in an online dating system","authors":"Xiaoxue Zang, T. Yamasaki, K. Aizawa, Tetsuhiro Nakamoto, E. Kuwabara, Shinichi Egami, Yusuke Fuchida","doi":"10.1109/ICME.2017.8019374","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019374","url":null,"abstract":"An increasing number of people are using dating websites to search for their life partners. This leads to the curiosity of how attractive a specific person is to the opposite gender on an average level. We propose a novel algorithm to evaluate people's objective attractiveness based on their interactions with other users on the dating websites and implement machine learning algorithms to predict their objective attractiveness ratings from their profiles. We validate our method on a large dataset gained from a Japanese dating website and yield convincing results. Our prediction based on users' profiles, which includes image and text contents, is over 80% correlated with the real values of the calculated objective attractiveness for the female and over 50% correlated with the real values of the calculated objective attractiveness for the male.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131582497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Notify-and-interact: A beacon-smartphone interaction for user engagement in galleries","authors":"Pai Chet Ng, James She, Soochang Park","doi":"10.1109/ICME.2017.8019467","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019467","url":null,"abstract":"Existing interactive systems suffer from low user engagement due to their passiveness and steep learning curve. To address these issues, this paper presents an interactive framework, Notify-and-Interact, which leverages the Bluetooth low energy (BLE) beacon infrastructure to notify and a smart-phone to interact, such that it transforms a passive interactive system into an active one. The proposed framework is demonstrated in the Ping Yuan and Kinmay W Tang Gallery, where a series of wildlife artworks are exhibited. Engagement conversion rate is measured, and users' quality of experience (QoE) is surveyed through likert assessment. Artworks with Notify-and-Interact outperforms the QR code with a high engagement conversion rate at the interaction stage, i.e., 86% over 53%, and an average engagement time of 55.67s over 28.69s, respectively. The mean opinion score (MOS) shows that around 80% of the users expressed high satisfaction with the installed Notify-and-Interact framework in the gallery.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134165335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An extended probabilistic collaborative representation based classifier for image classification","authors":"Rushi Lan, Yicong Zhou","doi":"10.1109/ICME.2017.8019308","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019308","url":null,"abstract":"Collaborative representation based classifier (CRC) and its probabilistic improvement ProCRC have achieved satisfactory performance in many image classification applications. They, however, do not comprehensively take account of the structure characteristics of the training samples. In this paper, we present an extended probabilistic collaborative representation based classifier (EProCRC) for image classification. Compared with CRC and ProCRC, the proposed EProCRC further considers a prior information that describes the distribution of each class in the training data. This prior information enlarges the margin between different classes to enhance the discriminative capacity of EProCRC. Experiments on two challenging databases, namely CUB200-2011 and Caltech-256, are conducted to evaluate EProCRC, and comparison results demonstrate that it outperforms several state-of-the-art classifiers.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125170373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jing Li, Fangbing Zhang, Lisong Wei, Tao Yang, Zhongzhen Li
{"title":"Cube surface modeling for human detection in crowd","authors":"Jing Li, Fangbing Zhang, Lisong Wei, Tao Yang, Zhongzhen Li","doi":"10.1109/ICME.2017.8019311","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019311","url":null,"abstract":"Human detection in dense crowds poses to be a demanding task owing to complex background and serious occlusion. In this paper, we propose a novel real-time and reliable human detection system. We solve the human detection problem by presenting a novel cube surface model captured by a binocular stereo vision camera. We first propose a cube surface model to estimate the 3D background cubes in the surveillance area. We then develop a shadow-free strategy for cube surface model updating. Thereafter, we present a shadow weighted clustering method to efficiently search for human as well as remove false alarms. Ultimately, we have developed a highly robust human detection system, and we carefully evaluate our system in many real challenge indoor and outdoor scenes. Expensive experiments demonstrate our system achieves real-time performance, higher detection rate and lower face alarms in comparison with state-of-the-art human detection methods.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114472152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image captioning with deep LSTM based on sequential residual","authors":"Kaisheng Xu, Hanli Wang, Pengjie Tang","doi":"10.1109/ICME.2017.8019408","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019408","url":null,"abstract":"Image captioning is a fundamental task which requires semantic understanding of images and the ability of generating description sentences with proper and correct structure. In consideration of the problem that language models are always shallow in modern image caption frameworks, a deep residual recurrent neural network is proposed in this work with the following two contributions. First, an easy-to-train deep stacked Long Short Term Memory (LSTM) language model is designed to learn the residual function of output distributions by adding identity mappings to multi-layer LSTMs. Second, in order to overcome the over-fitting problem caused by larger-scale parameters in deeper LSTM networks, a novel temporal Dropout method is proposed into LSTM. The experimental results on the benchmark MSCOCO and Flickr30K datasets demonstrate that the proposed model achieves the state-of-the-art performances with 101.1 in CIDEr on MSCOCO and 22.9 in B-4 on Flickr30K, respectively.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128166541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning-based adaptive tone mapping for keypoint detection","authors":"A. Rana, G. Valenzise, F. Dufaux","doi":"10.1109/ICME.2017.8019394","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019394","url":null,"abstract":"The goal of tone mapping operators (TMOs) has traditionally been to display high dynamic range (HDR) pictures in a perceptually favorable way. However, when tone-mapped images are to be used for computer vision tasks such as keypoint detection, these design approaches are suboptimal. In this paper, we propose a new learning-based adaptive tone mapping framework which aims at enhancing keypoint stability under drastic illumination variations. To this end, we design a pixel-wise adaptive TMO which is modulated based on a model derived by Support Vector Regression (SVR) using local higher order characteristics. To circumvent the difficulty to train SVR in this context, we further propose a simple detection-similarity-maximization model to generate appropriate training samples using multiple images undergoing illumination transformations. We evaluate the performance of our proposed framework in terms of keypoint repeatability for state-of-the-art keypoint detectors. Experimental results show that our proposed learning-based adaptive TMO yields higher keypoint stability when compared to existing perceptually-driven state-of-the-art TMOs.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116141685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lijun Zhao, H. Bai, Jie Liang, Anhong Wang, Yao Zhao
{"title":"Single depth image super-resolution with multiple residual dictionary learning and refinement","authors":"Lijun Zhao, H. Bai, Jie Liang, Anhong Wang, Yao Zhao","doi":"10.1109/ICME.2017.8019331","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019331","url":null,"abstract":"Learning-based image super-resolution methods often use large datasets to learn texture features. When these methods are applied to depth images, emphasis should be given on learning the geometrical structures at object boundaries, since depth images do not have much texture information. In this paper, we develop a scheme to learn multiple residual dictionaries from only one external image. After depth image super-resolution, some artifacts may appear. An adaptive depth map refinement method is then proposed to remove these artifacts along the depth edges, based on the shape-adaptive weighted median filtering method. Experimental results demonstrate the advantage of the proposed method over many other methods.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127747589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimating political leanings from mass media via graph-signal restoration with negative edges","authors":"B. Renoust, Gene Cheung, S. Satoh","doi":"10.1109/ICME.2017.8019302","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019302","url":null,"abstract":"Politicians in the same political party often share the same views on social issues and legislative agendas. By mining patterns in TV news co-appearances and Twitter followers, in this paper we estimate political leanings (left / right) of unknown individuals, and detect outlier politicians who have views different from their colleagues in the same party, from a graph signal processing (GSP) perspective. Specifically, we first construct a similarity graph with politicians as nodes, where a positive edge connects two politicians with sizable shared Twitter followers, and a negative edge connects two politicians appearing in the same TV news segment (and thus likely take opposite stands on the same issue). Given a graph with both positive and negative edges, we propose a new graph-signal smoothness prior based on a constructed generalized graph Laplacian matrix that is guaranteed to be positive semi-definite. We formulate a graph-signal restoration problem that can be solved in closed form. Experimental results show that political leanings of unknown individuals can be reliably estimated and outlier politicians can be detected.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121793891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Geographic information use in weakly-supervised deep learning for landmark recognition","authors":"Yifang Yin, Zhenguang Liu, Roger Zimmermann","doi":"10.1109/ICME.2017.8019376","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019376","url":null,"abstract":"The successful deep convolutional neural networks for visual object recognition typically rely on a massive number of training images that are well annotated by class labels or object bounding boxes with great human efforts. Here we explore the use of the geographic metadata, which are automatically retrieved from sensors such as GPS and compass, in weakly-supervised learning techniques for landmark recognition. The visibility of a landmark in a frame can be calculated based on the camera's field-of-view and the landmark's geometric information such as location and height. Subsequently, a training dataset is generated as the union of the frames with presence of at least one target landmark. To reduce the impact of the intrinsic noise in the geo-metadata, we present a frame selection method that removes the mistakenly labeled frames with a two-step approach consisting of (1) Gaussian Mixture Model clustering based on camera location followed by (2) outlier removal based on visual consistency. We compare the classification results obtained from the ground truth labels and the noisy labels derived from the raw geo-metadata. Experiments show that training based on the raw geo-metadata achieves a Mean Average Precision (MAP) of 0.797. Moreover, by applying our proposed representative frame selection method, the MAP can be further improved by 6.4%, which indicates the promising use of the geo-metadata in weakly-supervised learning techniques.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115817680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic backlight scaling considering ambient luminance for mobile energy saving","authors":"Wei Sun, Guangtao Zhai, Xiongkuo Min, Yutao Liu, Siwei Ma, Jing Liu, Jiantao Zhou, Xianming Liu","doi":"10.1109/ICME.2017.8019511","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019511","url":null,"abstract":"The mobile video playback involves many subsystems of the devices such as computing, rendering and displaying subsystems. Among all subsystems, the displaying subsystem accounts for at least 38% of all consumed power, and it can be up to 68% with the maximum backlight brightness. What is more, lots of people watch videos via mobile devices in various situations, where the ambient luminance condition is different. Therefore, how to save mobile energy and improve the Quality of Experience (QoE) in different situations become significant problems. In this paper, we try to maximally enhance the battery power performance under various ambient luminance conditions through backlight magnitude adjusting, while without negatively impacting users' QoE. In particular, we conduct a series of subject quality assessment experiments to uncover the quantitative relationship among QoE, ambient luminance, video content luminance and backlight level. We first study whether the continuous playback of backlight-scaled shots using the proposed scaling magnitude would cause flicker effect or not. Then motivated by the findings of these subject studies, we implement a Dynamic Backlight Scaling (DBS) strategy. The experiment results demonstrate that the DBS strategy can save more than 40% power at most and can also save 10% power even at a very high ambient luminance.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131847143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}