Proceedings of the 23rd ACM international conference on Multimedia最新文献

筛选
英文 中文
RECfusion: Automatic Video Curation Driven by Visual Content Popularity RECfusion:由视觉内容流行驱动的自动视频管理
Proceedings of the 23rd ACM international conference on Multimedia Pub Date : 2015-10-13 DOI: 10.1145/2733373.2806311
A. Ortis, G. Farinella, V. D'Amico, Luca Addesso, Giovanni Torrisi, S. Battiato
{"title":"RECfusion: Automatic Video Curation Driven by Visual Content Popularity","authors":"A. Ortis, G. Farinella, V. D'Amico, Luca Addesso, Giovanni Torrisi, S. Battiato","doi":"10.1145/2733373.2806311","DOIUrl":"https://doi.org/10.1145/2733373.2806311","url":null,"abstract":"The proliferation of mobile devices and the diffusion of social media have changed the communication paradigm of people that share multimedia data by allowing new interaction models (e.g., social networks). In social events (e.g., concerts), the automatic video understanding goal includes the interpretation of which visual contents are the most popular. The popularity of a visual content depends on how many people are looking at that scene, and therefore it could be obtained through the \"visual consensus\" among multiple video streams acquired by the different users devices. In this work we present RECfusion, a system able to automatically create a single video from multiple video sources by taking into account the popularity of the acquired scenes. The frames composing the final popular video are selected from the different video streams by considering those visual scenes which are pointed and recorded by the highest number of users' devices. Results on two benchmark datasets confirm the effectiveness of the proposed system.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114944504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Filter-Invariant Image Classification on Social Media Photos 社交媒体照片的滤波不变图像分类
Proceedings of the 23rd ACM international conference on Multimedia Pub Date : 2015-10-13 DOI: 10.1145/2733373.2806348
Yu-Hsiu Chen, T. Chao, Sheng-Yi Bai, Yen-Liang Lin, Wen-Chin Chen, Winston H. Hsu
{"title":"Filter-Invariant Image Classification on Social Media Photos","authors":"Yu-Hsiu Chen, T. Chao, Sheng-Yi Bai, Yen-Liang Lin, Wen-Chin Chen, Winston H. Hsu","doi":"10.1145/2733373.2806348","DOIUrl":"https://doi.org/10.1145/2733373.2806348","url":null,"abstract":"With the popularity of social media nowadays, tons of photos are uploaded everyday. To understand the image content, image classification becomes a very essential technique for plenty of applications (e.g., object detection, image caption generation). Convolutional Neural Network (CNN) has been shown as the state-of-the-art approach for image classification. However, one of the characteristics in social media photos is that they are often applied with photo filters, especially on Instagram. We find that prior works do not aware of this trend in social media photos and fail on filtered images. Thus, we propose a novel CNN architecture that utilizes the power of pairwise constraint by combining Siamese network and the proposed adaptive margin contrastive loss with our discriminative pair sampling method to solve the problem of filter bias. To the best of our knowledge, this is the first work to tackle filter bias on CNN and achieve state-of-the-art performance on a filtered subset of ILSVRC2012.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115332528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Beyond Doctors: Future Health Prediction from Multimedia and Multimodal Observations 超越医生:来自多媒体和多模式观察的未来健康预测
Proceedings of the 23rd ACM international conference on Multimedia Pub Date : 2015-10-13 DOI: 10.1145/2733373.2806217
Liqiang Nie, Luming Zhang, Yi Yang, Meng Wang, Richang Hong, Tat-Seng Chua
{"title":"Beyond Doctors: Future Health Prediction from Multimedia and Multimodal Observations","authors":"Liqiang Nie, Luming Zhang, Yi Yang, Meng Wang, Richang Hong, Tat-Seng Chua","doi":"10.1145/2733373.2806217","DOIUrl":"https://doi.org/10.1145/2733373.2806217","url":null,"abstract":"Although chronic diseases cannot be cured, they can be effectively controlled as long as we understand their progressions based on the current observational health records, which is often in the form of multimedia data. A large and growing body of literature has investigated the disease progression problem. However, far too little attention to date has been paid to jointly consider the following three observations of the chronic disease progression: 1) the health statuses at different time points are chronologically similar; 2) the future health statuses of each patient can be comprehensively revealed from the current multimedia and multimodal observations, such as visual scans, digital measurements and textual medical histories; and 3) the discriminative capabilities of different modalities vary significantly in accordance to specific diseases. In the light of these, we propose an adaptive multimodal multi-task learning model to co-regularize the modality agreement, temporal progression and discriminative capabilities of different modalities. We theoretically show that our proposed model is a linear system. Before training our model, we address the data missing problem via the matrix factorization approach. Extensive evaluations on a real-world Alzheimer's disease dataset well verify our proposed model. It should be noted that our model is also applicable to other chronic diseases.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116379853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 112
A Semantic Geo-Tagged Multimedia-Based Routing in a Crowdsourced Big Data Environment 众包大数据环境下基于语义地理标记的多媒体路由
Proceedings of the 23rd ACM international conference on Multimedia Pub Date : 2015-10-13 DOI: 10.1145/2733373.2807985
F. Rehman, A. Lbath, Abdullah Murad, Mohamed Abdur Rahman, Bilal Sadiq, Akhlaq Ahmad, A. Qamar, Saleh M. Basalamah
{"title":"A Semantic Geo-Tagged Multimedia-Based Routing in a Crowdsourced Big Data Environment","authors":"F. Rehman, A. Lbath, Abdullah Murad, Mohamed Abdur Rahman, Bilal Sadiq, Akhlaq Ahmad, A. Qamar, Saleh M. Basalamah","doi":"10.1145/2733373.2807985","DOIUrl":"https://doi.org/10.1145/2733373.2807985","url":null,"abstract":"Traditional routing algorithms for calculating the fastest or shortest path become ineffective or difficult to use when both source and destination are dynamic or unknown. To solve the problem, we propose a novel semantic routing system that leverages geo-tagged rich crowdsourced multimedia information such as images, audio, video and text to add semantics to the conventional routing. Our proposed system includes a Semantic Multimedia Routing Algorithm (SMRA) that uses an indexed spatial big data environment to answer multimedia spatio-temporal queries in real-time. The results are customized to the users' smartphone bandwidth and resolution requirements. The system has been designed to be able to handle a very large number of multimedia spatio-temporal requests at any given moment. A proof of concept of the system will be demonstrated through two scenarios. These are 1) multimedia enhanced routing and 2) finding lost individuals in a large crowd using multimedia. We plan to test the system's performance and usability during Hajj 2015, where over four million pilgrims from all over the world gather to perform their rituals.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114402739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Unsupervised Cosegmentation based on Global Graph Matching 基于全局图匹配的无监督共分割
Proceedings of the 23rd ACM international conference on Multimedia Pub Date : 2015-10-13 DOI: 10.1145/2733373.2806317
Takanori Tamanaha, Hideki Nakayama
{"title":"Unsupervised Cosegmentation based on Global Graph Matching","authors":"Takanori Tamanaha, Hideki Nakayama","doi":"10.1145/2733373.2806317","DOIUrl":"https://doi.org/10.1145/2733373.2806317","url":null,"abstract":"Cosegmentation is defined as the task of segmenting a common object from multiple images. Hitherto, graph matching has been known as a promising approach because of its flexibility in matching deformable objects and regions, and several methods based on this approach have been proposed. However, candidate foregrounds obtained by a local matching algorithm in previous methods tend to include false-positive areas, particularly when visually similar backgrounds (e.g., sky) commonly appear across images. We propose an unsupervised cosegmentation method based on a global graph matching algorithm. Rather than using a local matching algorithm that finds a small common subgraph, we employ global matching that can find a one-to-one mapping for every vertex between input graphs such that we can remove negative regions estimated as background. Experimental results obtained using the iCoseg and MSRC datasets demonstrate that the accuracy of the proposed method is higher than that of previous graph-based methods.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121854365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Dive into Remote Events: Omnidirectional Video Streaming with Acoustic Immersion 潜入远程事件:全向视频流与声沉浸
Proceedings of the 23rd ACM international conference on Multimedia Pub Date : 2015-10-13 DOI: 10.1145/2733373.2807963
D. Ochi, K. Niwa, A. Kameda, Y. Kunita, Akira Kojima
{"title":"Dive into Remote Events: Omnidirectional Video Streaming with Acoustic Immersion","authors":"D. Ochi, K. Niwa, A. Kameda, Y. Kunita, Akira Kojima","doi":"10.1145/2733373.2807963","DOIUrl":"https://doi.org/10.1145/2733373.2807963","url":null,"abstract":"We propose a system that can provide the physical presence of remote events through a head mount display (HMD) and a headphone. It can stream omnidirectional video within a limited network bandwidth at a high bitrate without sending regions that users are not viewing. It can also reproduce binaural sounds by convoluting head related transfer functions and angular region-wise separated signals. Technical demos of the system using an Oculus Rift HMD with a headphone will be performed to enable users to experience the visual and acoustic immersion it provides.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122013333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Color Photo Makeover via Crowd Sourcing and Recoloring 彩色照片改造通过众包和重新上色
Proceedings of the 23rd ACM international conference on Multimedia Pub Date : 2015-10-13 DOI: 10.1145/2733373.2806370
Wengang Cheng, Ruru Jiang, Chang Wen Chen
{"title":"Color Photo Makeover via Crowd Sourcing and Recoloring","authors":"Wengang Cheng, Ruru Jiang, Chang Wen Chen","doi":"10.1145/2733373.2806370","DOIUrl":"https://doi.org/10.1145/2733373.2806370","url":null,"abstract":"It is not always easy for amateur photographers to capture photos with desired colors even on a classic hot spot as the appearance of color photo dependent on many factors. This paper proposes a novel approach to recolor given photos via a crowdsourcing based makeover scheme. When a user input a photo to be recolored, the proposed system will first conduct favorite exemplars suggestion from the images hosted by the social media sites, by jointly leveraging contextual and visual information associated with the images. The recommended exemplars shall reveal the scene and context dependent color compositions and provide users with diverse possible color styles. Then, a novel superpixel-based recoloring scheme, incorporating color statistics, texture characteristics and spatial constraints into soft matching, is applied to generate new photos of desired color. Experiments and a user study demonstrate that the proposed color photo makeover is able to achieve robust recoloring results for various outdoor photos.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117036327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Joint Modeling of Users' Interests and Mobility Patterns for Point-of-Interest Recommendation 兴趣点推荐中用户兴趣和移动模式的联合建模
Proceedings of the 23rd ACM international conference on Multimedia Pub Date : 2015-10-13 DOI: 10.1145/2733373.2806339
Hongzhi Yin, B. Cui, Zi Huang, Weiqing Wang, X. Wu, Xiaofang Zhou
{"title":"Joint Modeling of Users' Interests and Mobility Patterns for Point-of-Interest Recommendation","authors":"Hongzhi Yin, B. Cui, Zi Huang, Weiqing Wang, X. Wu, Xiaofang Zhou","doi":"10.1145/2733373.2806339","DOIUrl":"https://doi.org/10.1145/2733373.2806339","url":null,"abstract":"Point-of-Interest (POI) recommendation has become an important means to help people discover interesting places, especially when users travel out of town. However, extreme sparsity of user-POI matrix creates a severe challenge. To cope with this challenge, we propose a unified probabilistic generative model, Topic-Region Model (TRM), to simultaneously discover the semantic, temporal and spatial patterns of users' check-in activities, and to model their joint effect on users' decision-making for POIs. We conduct extensive experiments to evaluate the performance of our TRM on two real large-scale datasets, and the experimental results clearly demonstrate that TRM outperforms the state-of-art methods.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128751993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 62
Human Action Recognition With Trajectory Based Covariance Descriptor In Unconstrained Videos 基于轨迹协方差描述符的无约束视频人体动作识别
Proceedings of the 23rd ACM international conference on Multimedia Pub Date : 2015-10-13 DOI: 10.1145/2733373.2806310
Hanli Wang, Yun Yi, Jun Wu
{"title":"Human Action Recognition With Trajectory Based Covariance Descriptor In Unconstrained Videos","authors":"Hanli Wang, Yun Yi, Jun Wu","doi":"10.1145/2733373.2806310","DOIUrl":"https://doi.org/10.1145/2733373.2806310","url":null,"abstract":"Human action recognition from realistic videos plays a key role in multimedia event detection and understanding. In this paper, a novel Trajectory Based Covariance (TBC) descriptor is proposed, which is formulated along the dense trajectories. To map the descriptor matrix to vector space and trim out the redundancy of data, the TBC descriptor matrix is projected to Euclidean space by the Logarithm Principal Components Analysis (LogPCA). Our method is tested on the challenging Hollywood2 and TV Human Interaction datasets. Experimental results show that the proposed TBC descriptor outperforms three baseline descriptors (i.e., histogram of oriented gradient, histogram of optical flow and motion boundary histogram), and our method achieves better recognition performances than a number of state-of-the-art approaches.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121337938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Predicting and Understanding Urban Perception with Convolutional Neural Networks 用卷积神经网络预测和理解城市感知
Proceedings of the 23rd ACM international conference on Multimedia Pub Date : 2015-10-13 DOI: 10.1145/2733373.2806273
L. Porzi, S. R. Bulò, B. Lepri, E. Ricci
{"title":"Predicting and Understanding Urban Perception with Convolutional Neural Networks","authors":"L. Porzi, S. R. Bulò, B. Lepri, E. Ricci","doi":"10.1145/2733373.2806273","DOIUrl":"https://doi.org/10.1145/2733373.2806273","url":null,"abstract":"Cities' visual appearance plays a central role in shaping human perception and response to the surrounding urban environment. For example, the visual qualities of urban spaces affect the psychological states of their inhabitants and can induce negative social outcomes. Hence, it becomes critically important to understand people's perceptions and evaluations of urban spaces. Previous works have demonstrated that algorithms can be used to predict high level attributes of urban scenes (e.g. safety, attractiveness, uniqueness), accurately emulating human perception. In this paper we propose a novel approach for predicting the perceived safety of a scene from Google Street View Images. Opposite to previous works, we formulate the problem of learning to predict high level judgments as a ranking task and we employ a Convolutional Neural Network (CNN), significantly improving the accuracy of predictions over previous methods. Interestingly, the proposed CNN architecture relies on a novel pooling layer, which permits to automatically discover the most important areas of the images for predicting the concept of perceived safety. An extensive experimental evaluation, conducted on the publicly available Place Pulse dataset, demonstrates the advantages of the proposed approach over state-of-the-art methods.","PeriodicalId":427170,"journal":{"name":"Proceedings of the 23rd ACM international conference on Multimedia","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121420782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 110
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信