Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval最新文献_第7页

Multi-task Deep Neural Network for Joint Face Recognition and Facial Attribute Prediction 联合人脸识别与人脸属性预测的多任务深度神经网络

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3078973

Zhanxiong Wang, Keke He, Yanwei Fu, Rui Feng, Yu-Gang Jiang, X. Xue

引用次数: 49

With 5G Approaching, How will Audio/Video Technology that Serves 800 Million QQ Users Bring Forth New Ideas 5G即将来临，服务8亿QQ用户的音视频技术将如何创新

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3081369

Xiaozheng Huang

{"title":"With 5G Approaching, How will Audio/Video Technology that Serves 800 Million QQ Users Bring Forth New Ideas","authors":"Xiaozheng Huang","doi":"10.1145/3078971.3081369","DOIUrl":"https://doi.org/10.1145/3078971.3081369","url":null,"abstract":"Back to 1999, a popular IM QQ in China, stilled called OICQ at that time, released a new version, which included the functionality of audio call for the first time. Not much time later, video call was also enabled. After 18 years of fast growing, QQ has 800 million monthly active users.QQ users spend 1.2 billion minutes for audio and video call every single day. With QQ's fast growing, the audio and video technology behind it also evolves tremendously. We build our own audio/video technology center, which grows to Tencent Audio/Video Lab, and develops our own SDK when OEM cannot meet our needs. The new generation of audio/video communication engine \"SPEAR\", developed by our own, serves 800 million QQ users today. Our web broadcasting solution serves China's 10 top web broadcasting platforms, with 200 million user base and 70% market share of China. With 5G approaching, how will audio/video technology that serves 800 million QQ users bring forth new ideas? In this presentation, I will firstly introduce how the audio/video technology develops in Tencent Audio/Video Lab while internet transferring from PC to mobile. Secondly, I will explain the capability of our technology in the field of audio/video web communication, web broadcasting and image/audio/video processing. Thirdly, I will present our new research results and how they are used in our products and services. Then, I will talk a little about our future plan.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126131772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Geo-Privacy Bonus of Popular Photo Enhancements 流行照片增强的地理隐私红利

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3080543

Jaeyoung Choi, M. Larson, Xinchao Li, Kevin Li, G. Friedland, A. Hanjalic

{"title":"The Geo-Privacy Bonus of Popular Photo Enhancements","authors":"Jaeyoung Choi, M. Larson, Xinchao Li, Kevin Li, G. Friedland, A. Hanjalic","doi":"10.1145/3078971.3080543","DOIUrl":"https://doi.org/10.1145/3078971.3080543","url":null,"abstract":"Today's geo-location estimation approaches are able to infer the location of a target image using its visual content alone. These approaches typically exploit visual matching techniques, applied to a large collection of background images with known geo-locations. Users who are unaware that visual analysis and retrieval approaches can compromise their geo-privacy, unwittingly open themselves to risks of crime or other unintended consequences. This paper lays the groundwork for a new approach to geo-privacy of social images: Instead of requiring a change of user behavior, we start by investigating users' existing photo-sharing practices. We carry out a series of experiments using a large collection of social images (8.5M) to systematically analyze how photo editing practices impact the performance of geo-location estimation. We find that standard image enhancements, including filters and cropping, already serve as natural geo-privacy protectors. In our experiments, up to 19% of images whose location would otherwise be automatically predictable were unlocalizeable after enhancement. We conclude that it would be wrong to assume that geo-visual privacy is a lost cause in today's world of rapidly maturing machine learning. Instead, protecting users against the unwanted effects of pixel-based inference is a viable research field. A starting point is understanding the geo-privacy bonus of already established user behavior.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129067227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Discrete Multi-view Hashing for Effective Image Retrieval 用于有效图像检索的离散多视图哈希

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3078981

Rui Yang, Yuliang Shi, Xin-Shun Xu

{"title":"Discrete Multi-view Hashing for Effective Image Retrieval","authors":"Rui Yang, Yuliang Shi, Xin-Shun Xu","doi":"10.1145/3078971.3078981","DOIUrl":"https://doi.org/10.1145/3078971.3078981","url":null,"abstract":"Recently, hashing techniques have witnessed an increase in popularity due to their low storage cost and high query speed for large scale data retrieval task, e.g., image retrieval. Many methods have been proposed; however, most existing hashing techniques focus on single view data. In many scenarios, there are multiple views in data samples. Thus, those methods working on single view can not make full use of rich information contained in multi-view data. Although some methods have been proposed for multi-view data; they usually relax binary constraints or separate the process of learning hash functions and binary codes into two independent stages to bypass the obstacle of handling the discrete constraints on binary codes for optimization, which may generate large quantization error. To consider these problems, in this paper, we propose a novel hashing method, i.e., Discrete Multi-view Hashing (DMVH), which can work on multi-view data directly and make full use of rich information in multi-view data. Moreover, in DMVH, we optimize discrete codes directly instead of relaxing the binary constraints so that we could obtain high-quality hash codes. Simultaneously, we present a novel approach to construct similarity matrix, which can not only preserve local similarity structure, but also keep semantic similarity between data points. To solve the optimization problem in DMVH, we further propose an alternate algorithm. We test the proposed model on three large scale data sets. Experimental results show that it outperforms or is comparable to several state-of-the-arts.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128969147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

Linking Multimedia Content for Efficient News Browsing 链接多媒体内容，提高新闻浏览效率

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3079023

R. Bois, G. Gravier, Eric Jamet, E. Morin, Maxime Robert, P. Sébillot

{"title":"Linking Multimedia Content for Efficient News Browsing","authors":"R. Bois, G. Gravier, Eric Jamet, E. Morin, Maxime Robert, P. Sébillot","doi":"10.1145/3078971.3079023","DOIUrl":"https://doi.org/10.1145/3078971.3079023","url":null,"abstract":"As the amount of news information available online grows, media are in need of advanced tools to explore the information surrounding specific events before writing their own piece of news, e.g., adding context and insight. While many tools exist to extract information from large datasets, they do not offer an easy way to gain insight from a news collection by browsing, going from article to article and viewing unaltered original content. Such browsing tools require the creation of rich underlying structures such as graph representations. These representations can be further enhanced by typing links that connect nodes, in order to inform the user on the nature of their relation. In this article, we introduce an efficient way to generate links between news items in order to obtain an easily navigable graph, and enrich this graph by automatically typing created links. User evaluations are conducted on real world data in order to assess for the interest of both the graph representation and link typing in a press reviewing task, showing a significant improvement compared to classical search engines.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124198508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Session details: Oral Session 1: Vision and Language (Oral Presentations) 口头部分1:视觉与语言(口头报告)

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3254615

H. Cucu

引用次数: 0

On the Automatic Identification of Music for Common Activities 普通活动音乐的自动识别研究

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3078997

Karthik Yadati, Cynthia C. S. Liem, M. Larson, A. Hanjalic

{"title":"On the Automatic Identification of Music for Common Activities","authors":"Karthik Yadati, Cynthia C. S. Liem, M. Larson, A. Hanjalic","doi":"10.1145/3078971.3078997","DOIUrl":"https://doi.org/10.1145/3078971.3078997","url":null,"abstract":"In this paper, we address the challenge of identifying music suitable to accompany typical daily activities. We first derive a list of common activities by analyzing social media data. Then, an automatic approach is proposed to find music for these activities. Our approach is inspired by our experimentally acquired findings (a) that genre and instrument information, i.e., as appearing in the textual metadata, are not sufficient to distinguish music appropriate for different types of activities, and (b) that existing content-based approaches in the music information retrieval community do not overcome this insufficiency. The main contributions of our work are (a) our analysis of the properties of activity-related music that inspire our use of novel high-level features, e.g., drop-like events, and (b) our approach's novel method of extracting and combining low-level features, and, in particular, the joint optimization of the time window for feature aggregation and the number of features to be used. The effectiveness of the approach method is demonstrated in a comprehensive experimental study including failure analysis.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131180173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Joint Saliency Estimation and Matching using Image Regions for Geo-Localization of Online Video 基于图像区域的联合显著性估计与匹配用于在线视频地理定位

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3078996

Freda Shi, Jia Chen, Alexander Hauptmann

{"title":"Joint Saliency Estimation and Matching using Image Regions for Geo-Localization of Online Video","authors":"Freda Shi, Jia Chen, Alexander Hauptmann","doi":"10.1145/3078971.3078996","DOIUrl":"https://doi.org/10.1145/3078971.3078996","url":null,"abstract":"In this paper, we study automatic geo-localization of online event videos. Different from general image localization task through matching, the appearance of an environment during significant events varies greatly from its daily appearance, since there are usually crowds, decorations or even destruction when a major event happens. This introduces a major challenge: matching the event environment to the daily environment, e.g. as recorded by Google Street View. We observe that some regions in the image, as part of the environment, still preserve the daily appearance even though the whole image (environment) looks quite different. Based on this observation, we formulate the problem as joint saliency estimation and matching at the image region level, as opposed to the key point or whole-image level. As image-level labels of daily environment are easily generated with GPS information, we treat region based saliency estimation and matching as a weakly labeled learning problem over the training data. Our solution is to iteratively optimize saliency and the region-matching model. For saliency optimization, we derive a closed form solution, which has an intuitive explanation. For region matching model optimization, we use self-paced learning to learn from the pseudo labels generated by (sub-optimal) saliency values. We conduct extensive experiments on two challenging public datasets: Boston Marathon 2013 and Tokyo Time Machine. Experimental results show that our solution significantly improves over matching on whole images and the automatically learned saliency is a strong predictor of distinctive building areas.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132877260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Leveraging Multi-modal Prior Knowledge for Large-scale Concept Learning in Noisy Web Data 利用多模态先验知识在噪声网络数据中进行大规模概念学习

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3079003

Junwei Liang, Lu Jiang, Deyu Meng, Alexander Hauptmann

{"title":"Leveraging Multi-modal Prior Knowledge for Large-scale Concept Learning in Noisy Web Data","authors":"Junwei Liang, Lu Jiang, Deyu Meng, Alexander Hauptmann","doi":"10.1145/3078971.3079003","DOIUrl":"https://doi.org/10.1145/3078971.3079003","url":null,"abstract":"Learning video concept detectors automatically from the big but noisy web data with no additional manual annotations is a novel but challenging area in the multimedia and the machine learning community. A considerable amount of videos on the web is associated with rich but noisy contextual information, such as the title and other multi-modal information, which provides weak annotations or labels about the video content. To tackle the problem of large-scale noisy learning, We propose a novel method called Multi-modal WEbly-Labeled Learning (WELL-MM), which is established on the state-of-the-art machine learning algorithm inspired by the learning process of human. WELL-MM introduces a novel multi-modal approach to incorporate meaningful prior knowledge called curriculum from the noisy web videos. We empirically study the curriculum constructed from the multi-modal features of the Internet videos and images. The comprehensive experimental results on FCVID and YFCC100M demonstrate that WELL-MM outperforms state-of-the-art studies by a statically significant margin on learning concepts from noisy web video data. In addition, the results also verify that WELL-MM is robust to the level of noisiness in the video data. Notably, WELL-MM trained on sufficient noisy web labels is able to achieve a better accuracy to supervised learning methods trained on the clean manually labeled data.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133626311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Conditional Fast Style Transfer Network 条件快速风格传递网络

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3079037

Keiji Yanai, Ryosuke Tanno

引用次数: 21