A. Ortis, G. Farinella, V. D'Amico, Luca Addesso, Giovanni Torrisi, S. Battiato
{"title":"RECfusion: Automatic Video Curation Driven by Visual Content Popularity","authors":"A. Ortis, G. Farinella, V. D'Amico, Luca Addesso, Giovanni Torrisi, S. Battiato","doi":"10.1145/2733373.2806311","DOIUrl":"https://doi.org/10.1145/2733373.2806311","url":null,"abstract":"The proliferation of mobile devices and the diffusion of social media have changed the communication paradigm of people that share multimedia data by allowing new interaction models (e.g., social networks). In social events (e.g., concerts), the automatic video understanding goal includes the interpretation of which visual contents are the most popular. The popularity of a visual content depends on how many people are looking at that scene, and therefore it could be obtained through the \"visual consensus\" among multiple video streams acquired by the different users devices. In this work we present RECfusion, a system able to automatically create a single video from multiple video sources by taking into account the popularity of the acquired scenes. The frames composing the final popular video are selected from the different video streams by considering those visual scenes which are pointed and recorded by the highest number of users' devices. Results on two benchmark datasets confirm the effectiveness of the proposed system.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114944504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu-Hsiu Chen, T. Chao, Sheng-Yi Bai, Yen-Liang Lin, Wen-Chin Chen, Winston H. Hsu
{"title":"Filter-Invariant Image Classification on Social Media Photos","authors":"Yu-Hsiu Chen, T. Chao, Sheng-Yi Bai, Yen-Liang Lin, Wen-Chin Chen, Winston H. Hsu","doi":"10.1145/2733373.2806348","DOIUrl":"https://doi.org/10.1145/2733373.2806348","url":null,"abstract":"With the popularity of social media nowadays, tons of photos are uploaded everyday. To understand the image content, image classification becomes a very essential technique for plenty of applications (e.g., object detection, image caption generation). Convolutional Neural Network (CNN) has been shown as the state-of-the-art approach for image classification. However, one of the characteristics in social media photos is that they are often applied with photo filters, especially on Instagram. We find that prior works do not aware of this trend in social media photos and fail on filtered images. Thus, we propose a novel CNN architecture that utilizes the power of pairwise constraint by combining Siamese network and the proposed adaptive margin contrastive loss with our discriminative pair sampling method to solve the problem of filter bias. To the best of our knowledge, this is the first work to tackle filter bias on CNN and achieve state-of-the-art performance on a filtered subset of ILSVRC2012.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115332528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liqiang Nie, Luming Zhang, Yi Yang, Meng Wang, Richang Hong, Tat-Seng Chua
{"title":"Beyond Doctors: Future Health Prediction from Multimedia and Multimodal Observations","authors":"Liqiang Nie, Luming Zhang, Yi Yang, Meng Wang, Richang Hong, Tat-Seng Chua","doi":"10.1145/2733373.2806217","DOIUrl":"https://doi.org/10.1145/2733373.2806217","url":null,"abstract":"Although chronic diseases cannot be cured, they can be effectively controlled as long as we understand their progressions based on the current observational health records, which is often in the form of multimedia data. A large and growing body of literature has investigated the disease progression problem. However, far too little attention to date has been paid to jointly consider the following three observations of the chronic disease progression: 1) the health statuses at different time points are chronologically similar; 2) the future health statuses of each patient can be comprehensively revealed from the current multimedia and multimodal observations, such as visual scans, digital measurements and textual medical histories; and 3) the discriminative capabilities of different modalities vary significantly in accordance to specific diseases. In the light of these, we propose an adaptive multimodal multi-task learning model to co-regularize the modality agreement, temporal progression and discriminative capabilities of different modalities. We theoretically show that our proposed model is a linear system. Before training our model, we address the data missing problem via the matrix factorization approach. Extensive evaluations on a real-world Alzheimer's disease dataset well verify our proposed model. It should be noted that our model is also applicable to other chronic diseases.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116379853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Rehman, A. Lbath, Abdullah Murad, Mohamed Abdur Rahman, Bilal Sadiq, Akhlaq Ahmad, A. Qamar, Saleh M. Basalamah
{"title":"A Semantic Geo-Tagged Multimedia-Based Routing in a Crowdsourced Big Data Environment","authors":"F. Rehman, A. Lbath, Abdullah Murad, Mohamed Abdur Rahman, Bilal Sadiq, Akhlaq Ahmad, A. Qamar, Saleh M. Basalamah","doi":"10.1145/2733373.2807985","DOIUrl":"https://doi.org/10.1145/2733373.2807985","url":null,"abstract":"Traditional routing algorithms for calculating the fastest or shortest path become ineffective or difficult to use when both source and destination are dynamic or unknown. To solve the problem, we propose a novel semantic routing system that leverages geo-tagged rich crowdsourced multimedia information such as images, audio, video and text to add semantics to the conventional routing. Our proposed system includes a Semantic Multimedia Routing Algorithm (SMRA) that uses an indexed spatial big data environment to answer multimedia spatio-temporal queries in real-time. The results are customized to the users' smartphone bandwidth and resolution requirements. The system has been designed to be able to handle a very large number of multimedia spatio-temporal requests at any given moment. A proof of concept of the system will be demonstrated through two scenarios. These are 1) multimedia enhanced routing and 2) finding lost individuals in a large crowd using multimedia. We plan to test the system's performance and usability during Hajj 2015, where over four million pilgrims from all over the world gather to perform their rituals.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114402739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised Cosegmentation based on Global Graph Matching","authors":"Takanori Tamanaha, Hideki Nakayama","doi":"10.1145/2733373.2806317","DOIUrl":"https://doi.org/10.1145/2733373.2806317","url":null,"abstract":"Cosegmentation is defined as the task of segmenting a common object from multiple images. Hitherto, graph matching has been known as a promising approach because of its flexibility in matching deformable objects and regions, and several methods based on this approach have been proposed. However, candidate foregrounds obtained by a local matching algorithm in previous methods tend to include false-positive areas, particularly when visually similar backgrounds (e.g., sky) commonly appear across images. We propose an unsupervised cosegmentation method based on a global graph matching algorithm. Rather than using a local matching algorithm that finds a small common subgraph, we employ global matching that can find a one-to-one mapping for every vertex between input graphs such that we can remove negative regions estimated as background. Experimental results obtained using the iCoseg and MSRC datasets demonstrate that the accuracy of the proposed method is higher than that of previous graph-based methods.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121854365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Ochi, K. Niwa, A. Kameda, Y. Kunita, Akira Kojima
{"title":"Dive into Remote Events: Omnidirectional Video Streaming with Acoustic Immersion","authors":"D. Ochi, K. Niwa, A. Kameda, Y. Kunita, Akira Kojima","doi":"10.1145/2733373.2807963","DOIUrl":"https://doi.org/10.1145/2733373.2807963","url":null,"abstract":"We propose a system that can provide the physical presence of remote events through a head mount display (HMD) and a headphone. It can stream omnidirectional video within a limited network bandwidth at a high bitrate without sending regions that users are not viewing. It can also reproduce binaural sounds by convoluting head related transfer functions and angular region-wise separated signals. Technical demos of the system using an Oculus Rift HMD with a headphone will be performed to enable users to experience the visual and acoustic immersion it provides.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122013333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Color Photo Makeover via Crowd Sourcing and Recoloring","authors":"Wengang Cheng, Ruru Jiang, Chang Wen Chen","doi":"10.1145/2733373.2806370","DOIUrl":"https://doi.org/10.1145/2733373.2806370","url":null,"abstract":"It is not always easy for amateur photographers to capture photos with desired colors even on a classic hot spot as the appearance of color photo dependent on many factors. This paper proposes a novel approach to recolor given photos via a crowdsourcing based makeover scheme. When a user input a photo to be recolored, the proposed system will first conduct favorite exemplars suggestion from the images hosted by the social media sites, by jointly leveraging contextual and visual information associated with the images. The recommended exemplars shall reveal the scene and context dependent color compositions and provide users with diverse possible color styles. Then, a novel superpixel-based recoloring scheme, incorporating color statistics, texture characteristics and spatial constraints into soft matching, is applied to generate new photos of desired color. Experiments and a user study demonstrate that the proposed color photo makeover is able to achieve robust recoloring results for various outdoor photos.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117036327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongzhi Yin, B. Cui, Zi Huang, Weiqing Wang, X. Wu, Xiaofang Zhou
{"title":"Joint Modeling of Users' Interests and Mobility Patterns for Point-of-Interest Recommendation","authors":"Hongzhi Yin, B. Cui, Zi Huang, Weiqing Wang, X. Wu, Xiaofang Zhou","doi":"10.1145/2733373.2806339","DOIUrl":"https://doi.org/10.1145/2733373.2806339","url":null,"abstract":"Point-of-Interest (POI) recommendation has become an important means to help people discover interesting places, especially when users travel out of town. However, extreme sparsity of user-POI matrix creates a severe challenge. To cope with this challenge, we propose a unified probabilistic generative model, Topic-Region Model (TRM), to simultaneously discover the semantic, temporal and spatial patterns of users' check-in activities, and to model their joint effect on users' decision-making for POIs. We conduct extensive experiments to evaluate the performance of our TRM on two real large-scale datasets, and the experimental results clearly demonstrate that TRM outperforms the state-of-art methods.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128751993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Human Action Recognition With Trajectory Based Covariance Descriptor In Unconstrained Videos","authors":"Hanli Wang, Yun Yi, Jun Wu","doi":"10.1145/2733373.2806310","DOIUrl":"https://doi.org/10.1145/2733373.2806310","url":null,"abstract":"Human action recognition from realistic videos plays a key role in multimedia event detection and understanding. In this paper, a novel Trajectory Based Covariance (TBC) descriptor is proposed, which is formulated along the dense trajectories. To map the descriptor matrix to vector space and trim out the redundancy of data, the TBC descriptor matrix is projected to Euclidean space by the Logarithm Principal Components Analysis (LogPCA). Our method is tested on the challenging Hollywood2 and TV Human Interaction datasets. Experimental results show that the proposed TBC descriptor outperforms three baseline descriptors (i.e., histogram of oriented gradient, histogram of optical flow and motion boundary histogram), and our method achieves better recognition performances than a number of state-of-the-art approaches.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121337938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting and Understanding Urban Perception with Convolutional Neural Networks","authors":"L. Porzi, S. R. Bulò, B. Lepri, E. Ricci","doi":"10.1145/2733373.2806273","DOIUrl":"https://doi.org/10.1145/2733373.2806273","url":null,"abstract":"Cities' visual appearance plays a central role in shaping human perception and response to the surrounding urban environment. For example, the visual qualities of urban spaces affect the psychological states of their inhabitants and can induce negative social outcomes. Hence, it becomes critically important to understand people's perceptions and evaluations of urban spaces. Previous works have demonstrated that algorithms can be used to predict high level attributes of urban scenes (e.g. safety, attractiveness, uniqueness), accurately emulating human perception. In this paper we propose a novel approach for predicting the perceived safety of a scene from Google Street View Images. Opposite to previous works, we formulate the problem of learning to predict high level judgments as a ranking task and we employ a Convolutional Neural Network (CNN), significantly improving the accuracy of predictions over previous methods. Interestingly, the proposed CNN architecture relies on a novel pooling layer, which permits to automatically discover the most important areas of the images for predicting the concept of perceived safety. An extensive experimental evaluation, conducted on the publicly available Place Pulse dataset, demonstrates the advantages of the proposed approach over state-of-the-art methods.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121420782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}