2012 IEEE International Conference on Multimedia and Expo最新文献

筛选
英文 中文
Recognition of Multiple-Food Images by Detecting Candidate Regions 基于候选区域的多幅食物图像识别
2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.157
Yuji Matsuda, H. Hoashi, Keiji Yanai
{"title":"Recognition of Multiple-Food Images by Detecting Candidate Regions","authors":"Yuji Matsuda, H. Hoashi, Keiji Yanai","doi":"10.1109/ICME.2012.157","DOIUrl":"https://doi.org/10.1109/ICME.2012.157","url":null,"abstract":"In this paper, we propose a two-step method to recognize multiple-food images by detecting candidate regions with several methods and classifying them with various kinds of features. In the first step, we detect several candidate regions by fusing outputs of several region detectors including Felzenszwalb's deformable part model (DPM) [1], a circle detector and the JSEG region segmentation. In the second step, we apply a feature-fusion-based food recognition method for bounding boxes of the candidate regions with various kinds of visual features including bag-of-features of SIFT and CSIFT with spatial pyramid (SP-BoF), histogram of oriented gradient (HoG), and Gabor texture features. In the experiments, we estimated ten food candidates for multiple-food images in the descending order of the confidence scores. As results, we have achieved the 55.8% classification rate, which improved the baseline result in case of using only DPM by 14.3 points, for a multiple-food image data set. This demonstrates that the proposed two-step method is effective for recognition of multiple-food images.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122350469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 287
Video Copy Detection Using a Soft Cascade of Multimodal Features 使用多模态特征的软级联的视频复制检测
2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.189
Menglin Jiang, Yonghong Tian, Tiejun Huang
{"title":"Video Copy Detection Using a Soft Cascade of Multimodal Features","authors":"Menglin Jiang, Yonghong Tian, Tiejun Huang","doi":"10.1109/ICME.2012.189","DOIUrl":"https://doi.org/10.1109/ICME.2012.189","url":null,"abstract":"In the video copy detection task, it is widely recognized that none of any single feature can work well for all transformations. Thus more and more approaches adopt a set of complementary features to cope with complex audio-visual transformations. However, most of them utilize individual features separately and the final result is obtained by fusing results of several basic detectors. Often, this will lead to low detection efficiency. Moreover, there are some thresholds or parameters to be elaborately tuned. To address these problems, we propose a soft cascade approach to integrate multiple features for efficient copy detection. In our approach, basic detectors are organized in a cascaded framework, which processes a query video in sequence until one detector asserts it as a copy. To fully exert the complementarity of these detectors, a learning algorithm is proposed to estimate the optimal decision thresholds in the cascade architecture. Excellent performance on the benchmark dataset of TRECVid 2011 CBCD task demonstrates the effectiveness and efficiency of our approach.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"263 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114264755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
SIFT-Based Image Compression 基于sift的图像压缩
2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.52
Huanjing Yue, Xiaoyan Sun, Feng Wu, Jingyu Yang
{"title":"SIFT-Based Image Compression","authors":"Huanjing Yue, Xiaoyan Sun, Feng Wu, Jingyu Yang","doi":"10.1109/ICME.2012.52","DOIUrl":"https://doi.org/10.1109/ICME.2012.52","url":null,"abstract":"This paper proposes a novel image compression scheme based on the local feature descriptor - Scale Invariant Feature Transform (SIFT). The SIFT descriptor characterizes an image region invariantly to scale and rotation. It is used widely in image retrieval. By using SIFT descriptors, our compression scheme is able to make use of external image contents to reduce visual redundancy among images. The proposed encoder compresses an input image by SIFT descriptors rather than pixel values. It separates the SIFT descriptors of the image into two groups, a visual description which is a significantly sub sampled image with key SIFT descriptors embedded and a set of differential SIFT descriptors, to reduce the coding bits. The corresponding decoder generates the SIFT descriptors from the visual description and the differential set. The SIFT descriptors are used in our SIFT-based matching to retrieve the candidate predictive patches from a large image dataset. These candidate patches are then integrated into the visual description, presenting the final reconstructed images. Our preliminary but promising results demonstrate the effectiveness of our proposed image coding scheme towards perceptual quality. Our proposed image compression scheme provides a feasible approach to make use of the visual correlation among images.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115310106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Robust Face Super-Resolution Using Free-Form Deformations for Low-Quality Surveillance Video 使用自由形式变形的低质量监控视频鲁棒面部超分辨率
2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.162
Tomonari Yoshida, Tomokazu Takahashi, Daisuke Deguchi, I. Ide, H. Murase
{"title":"Robust Face Super-Resolution Using Free-Form Deformations for Low-Quality Surveillance Video","authors":"Tomonari Yoshida, Tomokazu Takahashi, Daisuke Deguchi, I. Ide, H. Murase","doi":"10.1109/ICME.2012.162","DOIUrl":"https://doi.org/10.1109/ICME.2012.162","url":null,"abstract":"Recently, the demand for face recognition to identify persons from surveillance video cameras has rapidly increased. Since surveillance cameras are usually placed at positions far from a person's face, the quality of face images captured by the cameras tends to be low. This degrades the recognition accuracy. Therefore, aiming to improve the accuracy of the low-resolution-face recognition, we propose a video-based super-resolution method. The proposed method can generate a high-resolution face image from low-resolution video frames including non-rigid deformations caused by changes of face poses and expressions without using any positional information of facial feature points. Most existing techniques use the facial feature points for image alignment between the video frames. However, it is difficult to obtain the accurate positions of the feature points from low-resolution face images. To achieve the alignment, the proposed method uses a free-form deformation method that flexibly aligns each local region between the images. This enables super-resolution of face images from low-resolution videos. Experimental results demonstrated that the proposed method improved the performance of super-resolution for actual videos in terms of both image quality and face recognition accuracy.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"147 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123781335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Bringing Videos to Social Media 将视频带入社交媒体
2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.86
S. Kopf, Stefan Wilk, W. Effelsberg
{"title":"Bringing Videos to Social Media","authors":"S. Kopf, Stefan Wilk, W. Effelsberg","doi":"10.1109/ICME.2012.86","DOIUrl":"https://doi.org/10.1109/ICME.2012.86","url":null,"abstract":"Although the importance of video sharing and of social media is increasing from day to day, a full integration of videos into social media is not achieved yet. We have developed a system that maps the concept of hypervideo - allowing to annotate objects in a video - to social media. We define this combination as social video that simultaneously allows a large number of users to contribute to the content of a video. Users can annotate video objects by adding images, text, other videos, Web links, or even communication topics. An integrated chat system allows users to communicate with friends and to link these topics to distinct objects in the video. We analyze the technical functionality and the user acceptance of our social video system in detail. Due to the integration into the social network Facebook more than 12,000 users have already accessed our system.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123203186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Real-Time Storyboard Generation for H.264/AVC Compressed Videos 实时故事板生成的H.264/AVC压缩视频
2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.49
Pei Dong, Yong Xia, D. Feng
{"title":"Real-Time Storyboard Generation for H.264/AVC Compressed Videos","authors":"Pei Dong, Yong Xia, D. Feng","doi":"10.1109/ICME.2012.49","DOIUrl":"https://doi.org/10.1109/ICME.2012.49","url":null,"abstract":"Video summarization enables convenient and efficient management of large volume of visual data. However, most existing summarization approaches are based on either the pixel domain information or conventional video compression standards. As the most recent and popular international video coding standard, H.264/AVC adopts a number of advanced techniques and brings not only opportunities but also challenges to video summarization. In this paper, we propose a real-time image storyboard generation algorithm for H.264/AVC compressed videos by using both compressed domain and pixel domain information jointly and adaptively. This algorithm extracts compressed domain information for visual content representation, video structuring and candidate representative frame selection. By fusing both compressed domain and pixel domain information, the redundancy in the candidate representative frames is further reduced. Our experimental results show that the proposed algorithm can efficiently produce image storyboards conforming to human interpretation of the essential content in generic videos.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123270951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Exploiting Structured Sparsity for Image Deblurring 利用结构化稀疏性进行图像去模糊
2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.110
Haichao Zhang, Yanning Zhang, Thomas S. Huang
{"title":"Exploiting Structured Sparsity for Image Deblurring","authors":"Haichao Zhang, Yanning Zhang, Thomas S. Huang","doi":"10.1109/ICME.2012.110","DOIUrl":"https://doi.org/10.1109/ICME.2012.110","url":null,"abstract":"Sparsity is an ubiquitous property exhibited by many natural real-world data such as images, which has been playing an important role in image and multi-media data processing. However, for many data, such as images, the sparsity pattern is not completely random, i.e., there are structures over the sparse coefficients. By exploiting this structure, we can model the data better and may further improve the performance of the recovery algorithm. In this paper, we exploit the structured sparsity of natural images for image deblurring application. Experimental results clearly demonstrate the effectiveness of the proposed approach.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"755 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123278255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Position-Patch Based Face Hallucination via Locality-Constrained Representation 基于位置补丁的人脸幻觉
2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.152
Junjun Jiang, R. Hu, Zhen Han, T. Lu, Kebin Huang
{"title":"Position-Patch Based Face Hallucination via Locality-Constrained Representation","authors":"Junjun Jiang, R. Hu, Zhen Han, T. Lu, Kebin Huang","doi":"10.1109/ICME.2012.152","DOIUrl":"https://doi.org/10.1109/ICME.2012.152","url":null,"abstract":"Instead of using probabilistic graph based or manifold learning based models, some approaches based on position-patch have been proposed for face hallucination recently. In order to obtain the optimal weights for face hallucination, they represent image patches through those patches at the same position of training face images by employing least square estimation or convex optimization. However, they can hope neither to provide unbiased solutions nor to satisfy locality conditions, thus the obtained patch representation is not the best. In this paper, a simpler but more effective representation scheme- Locality-constrained Representation (LcR) has been developed, compared with the Least Square Representation (LSR) and Sparse Representation (SR). It imposes a locality constraint onto the least square inversion problem to reach sparsity and locality simultaneously. Experimental results demonstrate the superiority of the proposed method over some state-of-the-art face hallucination approaches.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"2006 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123802909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 73
Real-Time Hand Pose Estimation from RGB-D Sensor 基于RGB-D传感器的实时手部姿态估计
2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.48
Y. Yao, Y. Fu
{"title":"Real-Time Hand Pose Estimation from RGB-D Sensor","authors":"Y. Yao, Y. Fu","doi":"10.1109/ICME.2012.48","DOIUrl":"https://doi.org/10.1109/ICME.2012.48","url":null,"abstract":"Hand pose estimation in cluttered environment is always challenging. In this paper, we address the problem of hand pose estimation from RGB-D sensor. To achieve robust real-time usability, we first design a data acquisition strategy, using a color glove to label different hand parts, and collect a new training data set. Then a novel hand pose estimation framework is presented, so that feature fusion drives hand localization and hand parts classification. Moreover, instead of using articulated model, a simplified and efficient 3D contour model is designed to assist real-time implementation, which does not require a large amount of training data. Experiments show that our approach can handle real-time hand interaction in a desktop environments with cluttered background.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131608872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Discovering Social Photo Navigation Patterns 发现社交照片导航模式
2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.96
Luca Chiarandini, Michele Trevisiol, A. Jaimes
{"title":"Discovering Social Photo Navigation Patterns","authors":"Luca Chiarandini, Michele Trevisiol, A. Jaimes","doi":"10.1109/ICME.2012.96","DOIUrl":"https://doi.org/10.1109/ICME.2012.96","url":null,"abstract":"In general, user browsing behavior has been examined within specific tasks (e.g., search), or in the context of particular web sites or services ( e.g., in shopping sites). However, with the growth of social networks and the proliferation of many different types of web services ( e.g., news aggregators, blogs, forums, etc.), the web can be viewed as an ecosystem in which a user's actions in a particular web service may be influenced by the service she arrived from ( e.g., are users browsing patterns similar if they arrive at a website via search or via links in aggregators?). In particular, since photos in services like Flickr are used extensively throughout the web, it is common for visitors to the site to arrive via links in many different types of web sites. In this paper, we depart from the hypothesis that visitors to social sites such as Flickr behave differently depending on where they come from. For this purpose, we analyze a large sample of Flickr user logs to discover social photo navigation patterns. More specifically, we classify pages within Flickr into different categories ( e.g., \"add a friend page\", \"single photo page,\" etc.), and by clustering sessions discover important differences in social photo navigation that manifest themselves depending on the type of site users visit before visiting Flickr. Our work examines photo navigation patterns in Flickr for the first time taking into account the referrer domain. Our analysis is useful in that it can contribute to a better understanding of how people use photo services like Flickr, and it can be used to inform the design of user modeling and recommendation algorithms, among others.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131890336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信