2015 IEEE International Conference on Multimedia and Expo (ICME)最新文献

筛选
英文 中文
Multi-modal learning for gesture recognition 手势识别的多模态学习
2015 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177460
Congqi Cao, Yifan Zhang, Hanqing Lu
{"title":"Multi-modal learning for gesture recognition","authors":"Congqi Cao, Yifan Zhang, Hanqing Lu","doi":"10.1109/ICME.2015.7177460","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177460","url":null,"abstract":"With the development of sensing equipments, data from different modalities is available for gesture recognition. In this paper, we propose a novel multi-modal learning framework. A coupled hidden Markov model (CHMM) is employed to discover the correlation and complementary information across different modalities. In this framework, we use two configurations: one is multi-modal learning and multi-modal testing, where all the modalities used during learning are still available during testing; the other is multi-modal learning and single-modal testing, where only one modality is available during testing. Experiments on two real-world gesture recognition data sets have demonstrated the effectiveness of our multi-modal learning framework. Improvements on both of the multi-modal and single-modal testing have been observed.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124407098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
VTouch: Vision-enhanced interaction for large touch displays VTouch:用于大型触摸显示器的视觉增强交互
2015 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177390
Yinpeng Chen, Zicheng Liu, P. Chou, Zhengyou Zhang
{"title":"VTouch: Vision-enhanced interaction for large touch displays","authors":"Yinpeng Chen, Zicheng Liu, P. Chou, Zhengyou Zhang","doi":"10.1109/ICME.2015.7177390","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177390","url":null,"abstract":"We propose a system that augments touch input with visual understanding of the user to improve interaction with a large touch-sensitive display. A commodity color plus depth sensor such as Microsoft Kinect adds the visual modality and enables new interactions beyond touch. Through visual analysis, the system understands where the user is, who the user is, and what the user is doing even before the user touches the display. Such information is used to enhance interaction in multiple ways. For example, a user can use simple gestures to bring up menu items such as color palette and soft keyboard; menu items can be shown where the user is and can follow the user; hovering can show information to the user before the user commits to touch; the user can perform different functions (for example writing and erasing) with different hands; and the user's preference profile can be maintained, distinct from other users. User studies are conducted and the users very much appreciate the value of these and other enhanced interactions.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114661975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Distributed cooperative video coding for wireless video broadcast system 面向无线视频广播系统的分布式协同视频编码
2015 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177521
Mengyao Sun, Yumei Wang, Hao Yu, Yu Liu
{"title":"Distributed cooperative video coding for wireless video broadcast system","authors":"Mengyao Sun, Yumei Wang, Hao Yu, Yu Liu","doi":"10.1109/ICME.2015.7177521","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177521","url":null,"abstract":"In wireless video broadcast system, analog joint source-channel coding (JSCC) has shown advantage compared to conventional separate digital source/channel coding in the aspect that it can avoid cliff effect gracefully. What's more, analog JSCC only needs a little calculations at the encoder and has strong adaptability to different channel condition, which is very suitable to the wireless cooperative scenario. Thus in this paper, we propose a distributed cooperative video coding (DCVC) scheme for wireless video broadcast system. The scheme is based on the transmission structure of Softcast and borrows the basic idea of distributed video coding. Different from the former cooperative video delivery methods, DCVC utilizes analog coding and coset coding to avoid cliff effect and to make the best of transmission power. The experimental results show that DCVC outperforms the conventional WSVC and H.264/SVC cooperative schemes, especially when the cooperative channel is worse than the original source-terminal channel.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126194355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Flickr circles: Mining socially-aware aesthetic tendency Flickr圈子:挖掘具有社会意识的审美倾向
2015 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177384
Luming Zhang, Roger Zimmermann
{"title":"Flickr circles: Mining socially-aware aesthetic tendency","authors":"Luming Zhang, Roger Zimmermann","doi":"10.1109/ICME.2015.7177384","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177384","url":null,"abstract":"Aesthetic tendency discovery is a useful and interesting application in social media. This paper proposes to categorize large-scale Flickr users into multiple circles. Each circle contains users with similar aesthetic interests (e.g., landscapes or abstract paintings). We notice that: (1) an aesthetic model should be flexible as different visual features may be used to describe different image sets, and (2) the numbers of photos from different users varies significantly and some users have very few photos. Therefore, a regularized topic model is proposed to quantify user's aesthetic interest as a distribution in the latent space. Then, a graph is built to describe the similarity of aesthetic interests among users. Obviously, densely connected users are with similar aesthetic interests. Thus an efficient dense subgraph mining algorithm is adopted to group users into different circles. Experiments show that our approach accurately detects circles on an image set crawled from over 60,000 Flickr users.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128686483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Structure-preserving Image Quality Assessment 保持结构的图像质量评估
2015 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177436
Yilin Wang, Qiang Zhang, Baoxin Li
{"title":"Structure-preserving Image Quality Assessment","authors":"Yilin Wang, Qiang Zhang, Baoxin Li","doi":"10.1109/ICME.2015.7177436","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177436","url":null,"abstract":"Perceptual Image Quality Assessment (IQA) has many applications. Existing IQA approaches typically work only for one of three scenarios: full-reference, non-reference, or reduced-reference. Techniques that attempt to incorporate image structure information often rely on hand-crafted features, making them difficult to be extended to handle different scenarios. On the other hand, objective metrics like Mean Square Error (MSE), while being easy to compute, are often deemed ineffective for measuring perceptual quality. This paper presents a novel approach to perceptual quality assessment by developing an MSE-like metric, which enjoys the benefit of MSE in terms of inexpensive computation and universal applicability while allowing structural information of an image being taken into consideration. The latter was achieved through introducing structure-preserving kernelization into a MSE-like formulation. We show that the method can lead to competitive FR-IQA results. Further, by developing a feature coding scheme based on this formulation, we extend the model to improve the performance of NR-IQA methods. We report extensive experiments illustrating the results from both our FR-IQA and NR-IQA algorithms with comparison to existing state-of-the-art methods.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134368859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A framework of extracting multi-scale features using multiple convolutional neural networks 基于多卷积神经网络的多尺度特征提取框架
2015 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177449
Kuan-Chuan Peng, Tsuhan Chen
{"title":"A framework of extracting multi-scale features using multiple convolutional neural networks","authors":"Kuan-Chuan Peng, Tsuhan Chen","doi":"10.1109/ICME.2015.7177449","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177449","url":null,"abstract":"Most works related to convolutional neural networks (CNN) use the traditional CNN framework which extracts features in only one scale. We propose multi-scale convolutional neural networks (MSCNN) which can not only extract multi-scale features but also solve the issues of the previous methods which use CNN to extract multi-scale features. With the assumption of label-inheritable (LI) property, we also propose a method to generate exponentially more training examples for MSCNN from the given training set. Our experimental results show that MSCNN outperforms both the state-of-the-art methods and the traditional CNN framework on artist, artistic style, and architectural style classification, supporting that MSCNN outperforms the traditional CNN framework on the tasks which at least partially satisfy LI property.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129390990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Exploring feature space with semantic attributes 用语义属性探索特征空间
2015 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177441
Junjie Cai, Richang Hong, Meng Wang, Q. Tian
{"title":"Exploring feature space with semantic attributes","authors":"Junjie Cai, Richang Hong, Meng Wang, Q. Tian","doi":"10.1109/ICME.2015.7177441","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177441","url":null,"abstract":"Indexing is a critical step for searching digital images in a large database. To date, how to design discriminative and compact indexing strategy still remains a challenging issue, partly due to the well-known semantic gap between user queries and rich semantics in the large scale dataset. In this paper, we propose to construct a novel joint semantic-visual space by leveraging visual descriptors and semantic attributes, which aims to narrow down the semantic gap by taking both attribute and indexing into one framework. Such a joint space embraces the flexibility of conducting Coherent Semantic-visual Indexing, which employs binary codes to boost the retrieval speed with satisfying accuracy. To solve the proposed model effectively, three contributions are made in this submission. First, we propose an interactive optimization method to find the joint space of semantic and visual descriptors. Second, we prove the convergence property of our optimization algorithm, which guarantees our system will find a good solution in certain rounds. At last, we integrate the semantic-visual joint space system with spectral hashing, which can find an efficient solution to search up to million scale datasets. Experiments on two standard retrieval datasets i.e., Holidays1M and Oxford5K, show that the proposed method presents promising performance compared with the state-of-the-arts.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130793893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Single image super-resolution via 2D sparse representation 基于二维稀疏表示的单幅图像超分辨率
2015 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177485
Na Qi, Yunhui Shi, Xiaoyan Sun, Wenpeng Ding, Baocai Yin
{"title":"Single image super-resolution via 2D sparse representation","authors":"Na Qi, Yunhui Shi, Xiaoyan Sun, Wenpeng Ding, Baocai Yin","doi":"10.1109/ICME.2015.7177485","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177485","url":null,"abstract":"Image super-resolution with sparsity prior provides promising performance. However, traditional sparse-based super resolution methods transform a two dimensional (2D) image into a one dimensional (1D) vector, which ignores the intrinsic 2D structure as well as spatial correlation inherent in images. In this paper, we propose the first image super-resolution method which reconstructs a high resolution image from its low resolution counterpart via a two dimensional sparse model. Correspondingly, we present a new dictionary learning algorithm to fully make use of the corresponding relationship of two pairs of 2D dictionaries of low and high resolution images, respectively. Experimental results demonstrate that our proposed image super-resolution with 2D sparse model outperforms state-of-the-art 1D sparse model based super resolution methods in terms of both reconstruction ability and memory usage.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1983 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120847185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Optimization of the number of rays in interpolation for light field based free viewpoint systems 基于光场的自由视点系统插值中光线数的优化
2015 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177463
H. Shidanshidi, F. Safaei, W. Li
{"title":"Optimization of the number of rays in interpolation for light field based free viewpoint systems","authors":"H. Shidanshidi, F. Safaei, W. Li","doi":"10.1109/ICME.2015.7177463","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177463","url":null,"abstract":"Light field (LF) rendering is widely used in free viewpoint video systems (FVV). Different methods have been proposed to employ depth maps to improve the rendering quality. However, estimation of depth is often error-prone. In this paper, a new method based on the concept of effective sampling density (ESD) is proposed for evaluating the depth-based LF rendering algorithms at different levels of errors in the depth estimation. In addition, for a given rendering quality, we provide an estimation of number of rays required in the interpolation algorithm to compensate for the adverse effect caused by errors in depth maps. The proposed method is particularly useful in designing a rendering algorithm with inaccurate knowledge of depth to achieve the required rendering quality. Both the theoretical study and numerical simulations have verified the efficacy of the proposed method.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128631951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A probabilistic model for food image recognition in restaurants 饭店食物图像识别的概率模型
2015 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177464
Luis Herranz, Ruihan Xu, Shuqiang Jiang
{"title":"A probabilistic model for food image recognition in restaurants","authors":"Luis Herranz, Ruihan Xu, Shuqiang Jiang","doi":"10.1109/ICME.2015.7177464","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177464","url":null,"abstract":"A large amount of food photos are taken in restaurants for diverse reasons. This dish recognition problem is very challenging, due to different cuisines, cooking styles and the intrinsic difficulty of modeling food from its visual appearance. Contextual knowledge is crucial to improve recognition in such scenario. In particular, geocontext has been widely exploited for outdoor landmark recognition. Similarly, we exploit knowledge about menus and geolocation of restaurants and test images. We first adapt a framework based on discarding unlikely categories located far from the test image. Then we reformulate the problem using a probabilistic model connecting dishes, restaurants and geolocations. We apply that model in three different tasks: dish recognition, restaurant recognition and geolocation refinement. Experiments on a dataset including 187 restaurants and 701 dishes show that combining multiple evidences (visual, geolocation, and external knowledge) can boost the performance in all tasks.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129315154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信