Proceedings of the 21st ACM international conference on Multimedia最新文献_第8页

Strong geometrical consistency in large scale partial-duplicate image search 大范围部分重复图像搜索的强几何一致性

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502166

Junqiang Wang, Jinhui Tang, Yu-Gang Jiang

引用次数: 8

Correlated-spaces regression for learning continuous emotion dimensions 学习连续情绪维度的相关空间回归

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502201

M. Nicolaou, S. Zafeiriou, M. Pantic

{"title":"Correlated-spaces regression for learning continuous emotion dimensions","authors":"M. Nicolaou, S. Zafeiriou, M. Pantic","doi":"10.1145/2502081.2502201","DOIUrl":"https://doi.org/10.1145/2502081.2502201","url":null,"abstract":"Adopting continuous dimensional annotations for affective analysis has been gaining rising attention by researchers over the past years. Due to the idiosyncratic nature of this problem, many subproblems have been identified, spanning from the fusion of multiple continuous annotations to exploiting output-correlations amongst emotion dimensions. In this paper, we firstly empirically answer several important questions which have found partial or no answer at all so far in related literature. In more detail, we study the correlation of each emotion dimension (i) with respect to other emotion dimensions, (ii) to basic emotions (e.g., happiness, anger). As a measure for comparison, we use video and audio features. Interestingly enough, we find that (i) each emotion dimension is more correlated with other emotion dimensions rather than with face and audio features, and similarly (ii) that each basic emotion is more correlated with emotion dimensions than with audio and video features. A similar conclusion holds for discrete emotions which are found to be highly correlated to emotion dimensions as compared to audio and/or video features. Motivated by these findings, we present a novel regression algorithm (Correlated-Spaces Regression, CSR), inspired by Canonical Correlation Analysis (CCA) which learns output-correlations and performs supervised dimensionality reduction and multimodal fusion by (i) projecting features extracted from all modalities and labels onto a common space where their inter-correlation is maximised and (ii) learning mappings from the projected feature space onto the projected, uncorrelated label space.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82881414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Undo the codebook bias by linear transformation for visual applications 通过可视化应用程序的线性变换来撤消码本偏差

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502141

Chunjie Zhang, Yifan Zhang, Shuhui Wang, Junbiao Pang, Chao Liang, Qingming Huang, Q. Tian

{"title":"Undo the codebook bias by linear transformation for visual applications","authors":"Chunjie Zhang, Yifan Zhang, Shuhui Wang, Junbiao Pang, Chao Liang, Qingming Huang, Q. Tian","doi":"10.1145/2502081.2502141","DOIUrl":"https://doi.org/10.1145/2502081.2502141","url":null,"abstract":"The bag of visual words model (BoW) and its variants have demonstrate their effectiveness for visual applications and have been widely used by researchers. The BoW model first extracts local features and generates the corresponding codebook, the elements of a codebook are viewed as visual words. The local features within each image are then encoded to get the final histogram representation. However, the codebook is dataset dependent and has to be generated for each image dataset. This costs a lot of computational time and weakens the generalization power of the BoW model. To solve these problems, in this paper, we propose to undo the dataset bias by codebook linear transformation. To represent every points within the local feature space using Euclidean distance, the number of bases should be no less than the space dimensions. Hence, each codebook can be viewed as a linear transformation of these bases. In this way, we can transform the pre-learned codebooks for a new dataset. However, not all of the visual words are equally important for the new dataset, it would be more effective if we can make some selection using sparsity constraints and choose the most discriminative visual words for transformation. We propose an alternative optimization algorithm to jointly search for the optimal linear transformation matrixes and the encoding parameters. Image classification experimental results on several image datasets show the effectiveness of the proposed method.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87800082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

"Wow! you are so beautiful today!" “哇!你今天真漂亮!”

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502258

Luoqi Liu, Junliang Xing, Si Liu, Hui Xu, Xi Zhou, Shuicheng Yan

{"title":"\"Wow! you are so beautiful today!\"","authors":"Luoqi Liu, Junliang Xing, Si Liu, Hui Xu, Xi Zhou, Shuicheng Yan","doi":"10.1145/2502081.2502258","DOIUrl":"https://doi.org/10.1145/2502081.2502258","url":null,"abstract":"In this demo, we present Beauty e-Experts, a fully automatic system for hairstyle and facial makeup recommendation and synthesis. Given a user-provided frontal facial image with short/bound hair and no/light makeup, the Beauty e-Experts system can not only recommend the most suitable hairstyle and makeup, but also show the synthesis effects. Two problems are considered for the Beauty e-Experts system: what to recommend and how to wear, which describe a similar process of selecting and applying hairstyle and cosmetics in our daily life. For the what-to-recommend problem, we propose a multiple tree-structured super-graphs model to explore the complex relationships among the beauty attributes, beauty-related attributes and image features, and then based on this model, the most suitable beauty attributes for a given facial image can be efficiently inferred. For the how-to-wear problem, a facial image synthesis module is designed to seamlessly blend the recommended hairstyle and makeup into the user facial image. Extensive experimental evaluations and analysis on testing images well demonstrate the effectiveness of the proposed system.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"63 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85056647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

Violence detection in hollywood movies by the fusion of visual and mid-level audio cues 好莱坞电影中融合视觉和中级音频线索的暴力侦查

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502187

Esra Acar, F. Hopfgartner, S. Albayrak

引用次数: 24

Multiview semi-supervised ranking for automatic image annotation 用于自动图像标注的多视图半监督排序

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502136

Ali Fakeri-Tabrizi, Massih-Reza Amini, P. Gallinari

{"title":"Multiview semi-supervised ranking for automatic image annotation","authors":"Ali Fakeri-Tabrizi, Massih-Reza Amini, P. Gallinari","doi":"10.1145/2502081.2502136","DOIUrl":"https://doi.org/10.1145/2502081.2502136","url":null,"abstract":"Most photo sharing sites give their users the opportunity to manually label images. The labels collected that way are usually very incomplete due to the size of the image collections: most images are not labeled according to all the categories they belong to, and, conversely, many class have relatively few representative examples. Automated image systems that can deal with small amounts of labeled examples and unbalanced classes are thus necessary to better organize and annotate images. In this work, we propose a multiview semi-supervised bipartite ranking model which allows to leverage the information contained in unlabeled sets of images in order to improve the prediction performance, using multiple descriptions, or views of images. For each topic class, our approach first learns as many view-specific rankers as available views using the labeled data only. These rankers are then improved iteratively by adding pseudo-labeled pairs of examples on which all view-specific rankers agree over the ranking of examples within these pairs. We report on experiments carried out on the NUS-WIDE dataset, which show that the multiview ranking process improves predictive performances when a small number of labeled examples is available specially for unbalanced classes. We show also that our approach achieves significant improvements over a state-of-the art semi-supervised multiview classification model.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82037725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Listen, look, and gotcha: instant video search with mobile phones by layered audio-video indexing 听，看，和抓到你:即时视频搜索与移动电话通过分层音视频索引

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502084

Wu Liu, Tao Mei, Yongdong Zhang, Jintao Li, Shipeng Li

{"title":"Listen, look, and gotcha: instant video search with mobile phones by layered audio-video indexing","authors":"Wu Liu, Tao Mei, Yongdong Zhang, Jintao Li, Shipeng Li","doi":"10.1145/2502081.2502084","DOIUrl":"https://doi.org/10.1145/2502081.2502084","url":null,"abstract":"Mobile video is quickly becoming a mass consumer phenomenon. More and more people are using their smartphones to search and browse video content while on the move. In this paper, we have developed an innovative instant mobile video search system through which users can discover videos by simply pointing their phones at a screen to capture a very few seconds of what they are watching. The system is able to index large-scale video data using a new layered audio-video indexing approach in the cloud, as well as extract light-weight joint audio-video signatures in real time and perform progressive search on mobile devices. Unlike most existing mobile video search applications that simply send the original video query to the cloud, the proposed mobile system is one of the first attempts at instant and progressive video search leveraging the light-weight computing capacity of mobile devices. The system is characterized by four unique properties: 1) a joint audio-video signature to deal with the large aural and visual variances associated with the query video captured by the mobile phone, 2) layered audio-video indexing to holistically exploit the complementary nature of audio and video signals, 3) light-weight fingerprinting to comply with mobile processing capacity, and 4) a progressive query process to significantly reduce computational costs and improve the user experience---the search process can stop anytime once a confident result is achieved. We have collected 1,400 query videos captured by 25 mobile users from a dataset of 600 hours of video. The experiments show that our system outperforms state-of-the-art methods by achieving 90.79% precision when the query video is less than 10 seconds and 70.07% even when the query video is less than 5 seconds.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"165 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82268299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Motion compensated compressed domain watermarking 运动补偿压缩域水印

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502211

Tanima Dutta

引用次数: 6

Automatic generation of social media snippets for mobile browsing 自动生成用于移动浏览的社交媒体片段

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502116

Wenyuan Yin, Tao Mei, Chang Wen Chen

{"title":"Automatic generation of social media snippets for mobile browsing","authors":"Wenyuan Yin, Tao Mei, Chang Wen Chen","doi":"10.1145/2502081.2502116","DOIUrl":"https://doi.org/10.1145/2502081.2502116","url":null,"abstract":"The ongoing revolution in media consumption from traditional PCs to the pervasiveness of mobile devices is driving the adoption of social media in our daily lives. More and more people are using their mobile devices to enjoy social media content while on the move. However, mobile display constraints create challenges for presenting and authoring the rich media content on screens with limited display size. This paper presents an innovative system to automatically generate magazine-like social media visual summaries, which is called \"snippet,\" for efficient mobile browsing. The system excerpts the most salient and dominant elements, i.e., a major picture element and a set of textual elements, from the original media content, and composes these elements into a text overlaid image by maximizing information perception. In particular, we investigate a set of aesthetic rules and visual perception principles to optimize the layout of the extracted elements by considering display constraints. As a result, browsing the snippet on mobile devices is just like quickly glancing at a magazine. To the best of our knowledge, this paper represents one of the first attempts at automatic social media snippet generation by studying aesthetic rules and visual perception principles. We have conducted experiments and user studies with social posts from news entities. We demonstrated that the generated snippets are effective at representing media content in a visually appealing and compact way, leading to a better user experience when consuming social media content on mobile devices.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83865903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Temporal encoded F-formation system for social interaction detection 社会互动侦测的时间编码f -形成系统

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502096

Tian Gan, Yongkang Wong, Daqing Zhang, M. Kankanhalli

{"title":"Temporal encoded F-formation system for social interaction detection","authors":"Tian Gan, Yongkang Wong, Daqing Zhang, M. Kankanhalli","doi":"10.1145/2502081.2502096","DOIUrl":"https://doi.org/10.1145/2502081.2502096","url":null,"abstract":"In the context of a social gathering, such as a cocktail party, the memorable moments are generally captured by professional photographers or by the participants. The latter case is often undesirable because many participants would rather enjoy the event instead of being occupied by the photo-taking task. Motivated by this scenario, we propose the use of a set of cameras to automatically take photos. Instead of performing dense analysis on all cameras for photo capturing, we first detect the occurrence and location of social interactions via F-formation detection. In the sociology literature, F-formation is a concept used to define social interactions, where each detection only requires the spatial location and orientation of each participant. This information can be robustly obtained with additional Kinect depth sensors. In this paper, we propose an extended F-formation system for robust detection of interactions and interactants. The extended F-formation system employs a heat-map based feature representation for each individual, namely Interaction Space (IS), to model their location, orientation, and temporal information. Using the temporally encoded IS for each detected interactant, we propose a best-view camera selection framework to detect the corresponding best view camera for each detected social interaction. The extended F-formation system is evaluated with synthetic data on multiple scenarios. To demonstrate the effectiveness of the proposed system, we conducted a user study to compare our best view camera ranking with human's ranking using real-world data.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88936346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 61