2017 IEEE International Conference on Multimedia and Expo (ICME)最新文献

筛选
英文 中文
Non-rigid feature matching for image retrieval using global and local regularizations 基于全局和局部正则化的图像检索非刚性特征匹配
2017 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2017-07-01 DOI: 10.1109/ICME.2017.8019441
Yong Ma, Huabing Zhou, Jun Chen, Jingshu Shi, Zhongyuan Wang
{"title":"Non-rigid feature matching for image retrieval using global and local regularizations","authors":"Yong Ma, Huabing Zhou, Jun Chen, Jingshu Shi, Zhongyuan Wang","doi":"10.1109/ICME.2017.8019441","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019441","url":null,"abstract":"In this paper, we propose a probabilistic method for feature matching of near-duplicate images undergoing non-rigid transformations. We start by creating a set of putative correspondences based on the feature similarity, and then focus on removing outliers from the putative set and estimating the transformation as well. This is formulated as a maximum likelihood estimation of a Bayesian model with latent variables indicating whether matches in the putative set are inliers or outliers. We impose the non-parametric global geometrical constraints on the correspondence using Tikhonov regularizers in a reproducing kernel Hilbert space. We also introduce a local geometrical constraint to preserve local structures among neighboring feature points. The problem is solved by using the Expectation Maximization algorithm, and the closed-form solution of the transformation is derived in the maximization step. Moreover, a fast implementation based on sparse approximation is given which reduces the method computation complexity to linearithmic without performance sacrifice. Extensive experiments on real near-duplicate images for both feature matching and image retrieval demonstrate accurate results of the proposed method which outperforms current state-of-the-art methods, especially in case of severe outliers.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"80 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128126222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning deep and sparse feature representation for fine-grained object recognition 学习细粒度对象识别的深度和稀疏特征表示
2017 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2017-07-01 DOI: 10.1109/ICME.2017.8019386
M. Srinivas, Yen-Yu Lin, H. Liao
{"title":"Learning deep and sparse feature representation for fine-grained object recognition","authors":"M. Srinivas, Yen-Yu Lin, H. Liao","doi":"10.1109/ICME.2017.8019386","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019386","url":null,"abstract":"In this paper, we address fine-grained classification which is quite challenging due to high intra-class variations and subtle inter-class variations. Most modern approaches to fine-grained recognition are established based on convolutional neural networks (CNN). Despite the effectiveness, these approaches still suffer from two major problems. First, they highly rely on large sets of training data, but manually annotating numerous training data is expensive. Second, the learned feature presentations by these approaches are often of high dimensions, leading to less efficiency. To tackle the two problems, we present an approach where on-line dictionary learning is integrated into CNN. The dictionaries can be incrementally learned by leveraging a vast amount of weakly labeled data on the Internet. With these dictionaries, all the training and testing data can be sparsely represented. Our approach is evaluated and compared with the state-of-the-art approaches on the benchmark dataset, CUB-200-2011. The promising results demonstrate its superiority in both efficiency and accuracy.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"354 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125636083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Leveraging geometric correlation for input-adaptive facial landmark regression 利用几何相关性进行输入自适应面部地标回归
2017 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2017-07-01 DOI: 10.1109/ICME.2017.8019469
Yuyao Feng, Risheng Liu, Xin Fan, Kang Huyan, Zhongxuan Luo
{"title":"Leveraging geometric correlation for input-adaptive facial landmark regression","authors":"Yuyao Feng, Risheng Liu, Xin Fan, Kang Huyan, Zhongxuan Luo","doi":"10.1109/ICME.2017.8019469","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019469","url":null,"abstract":"Facial analysis plays very important role in many vision applications, such as authentication and entertainments. The very early works in the 1990s mostly focus on estimating geometric deformations of facial landmarks to address this task. While in the past several years, more and more efforts have been made to directly learn an appearance regression for facial analysis. Though training regressions on controlled facial images can successfully capture the appearance variations, the performance of these appearance-based models are tightly related to the quantity and quality of the training data. In this paper, we develop a novel framework, named geometric correlated landmark regression (GCLR), to inherit the advantages but overcome limitations of these two categories of methods. Specifically, we first establish a landmark-to-landmark regression to estimate the geometry of facial images. By further incorporating a sparse coding term into the regression framework, we can successfully leverage the geometric correlations between the test image and the shape dictionary, thus significantly enhance the geometry regression performance. Experimental results on various challenging facial data sets verify the effectiveness and efficiency of GCLR.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127967319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Keyword-driven image captioning via Context-dependent Bilateral LSTM 基于上下文相关双边LSTM的关键词驱动图像字幕
2017 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2017-07-01 DOI: 10.1109/ICME.2017.8019525
Xiaodan Zhang, Shengfeng He, Xinhang Song, Pengxu Wei, Shuqiang Jiang, Qixiang Ye, Jianbin Jiao, Rynson W. H. Lau
{"title":"Keyword-driven image captioning via Context-dependent Bilateral LSTM","authors":"Xiaodan Zhang, Shengfeng He, Xinhang Song, Pengxu Wei, Shuqiang Jiang, Qixiang Ye, Jianbin Jiao, Rynson W. H. Lau","doi":"10.1109/ICME.2017.8019525","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019525","url":null,"abstract":"Image captioning has recently received much attention. Existing approaches, however, are limited to describing images with simple contextual information, which typically generate one sentence to describe each image with only a single contextual emphasis. In this paper, we address this limitation from a user perspective with a novel approach. Given some keywords as additional inputs, the proposed method would generate various descriptions according to the provided guidance. Hence, descriptions with different focuses can be generated for the same image. Our method is based on a new Context-dependent Bilateral Long Short-Term Memory (CDB-LSTM) model to predict a keyword-driven sentence by considering the word dependence. The word dependence is explored externally with a bilateral pipeline, and internally with a unified and joint training process. Experiments on the MS COCO dataset demonstrate that the proposed approach not only significantly outperforms the baseline method but also shows good adaptation and consistency with various keywords.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115801153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Source separation using dictionary learning and deep recurrent neural network with locality preserving constraint 基于字典学习和局部保持约束的深度递归神经网络的源分离
2017 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2017-07-01 DOI: 10.1109/ICME.2017.8019516
Pham Tuan, Yuan-Shan Lee, S. Mathulaprangsan, Jia-Ching Wang
{"title":"Source separation using dictionary learning and deep recurrent neural network with locality preserving constraint","authors":"Pham Tuan, Yuan-Shan Lee, S. Mathulaprangsan, Jia-Ching Wang","doi":"10.1109/ICME.2017.8019516","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019516","url":null,"abstract":"Deep learning is a popular method for monaural source separation, and especially for extracting a singing voice from a single-channel song. However, deep learning-based source separation ignores the geometrical structure of the input data. This work develops a novel approach to source separation that is based on non-negative matrix factorization (NMF) and deep recurrent neural networks (DRNN) with a locality-preserving constraint. First, NMF was used to learn patterns from training data. The learned patterns are linearly combined with the output of DRNN. Second, a locality-preserving constraint is developed to exploit the inner-structure of the input data in the DRNN learning process. Experimental results obtained using the MIR-1K dataset reveal that the proposed algorithm outperforms the baselines.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"14 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134392067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Automatic skin and hair masking using fully convolutional networks 自动皮肤和头发掩蔽使用全卷积网络
2017 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2017-07-01 DOI: 10.1109/ICME.2017.8019339
Siyang Qin, Seongdo Kim, R. Manduchi
{"title":"Automatic skin and hair masking using fully convolutional networks","authors":"Siyang Qin, Seongdo Kim, R. Manduchi","doi":"10.1109/ICME.2017.8019339","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019339","url":null,"abstract":"Selfies have become commonplace. More and more people take pictures of themselves, and enjoy enhancing these pictures using a variety of image processing techniques. One specific functionality of interest is automatic skin and hair segmentation, as this allows for processing one's skin and hair separately. Traditional approaches require user input in the form of fully specified trimaps, or at least of “scribbles” indicating foreground and background areas, with high-quality masks then generated via matting. Manual input, however, can be difficult or tedious, especially on a smartphone's small screen. In this paper, we propose the use of fully convolutional networks (FCN) and fully-connected CRF to perform pixel-level semantic segmentation into skin, hair and background. The trimap thus generated is given as input to a standard matting algorithm, resulting in accurate skin and hair alpha masks. Our method achieves state-of-the-art performance on the LFW Parts dataset [1]. The effectiveness of our method is also demonstrated with a specific application case.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"6 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131623314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Hybrid color attribute compression for point cloud data 点云数据的混合颜色属性压缩
2017 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2017-07-01 DOI: 10.1109/ICME.2017.8019426
Li Cui, Haiyan Xu, E. Jang
{"title":"Hybrid color attribute compression for point cloud data","authors":"Li Cui, Haiyan Xu, E. Jang","doi":"10.1109/ICME.2017.8019426","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019426","url":null,"abstract":"This paper proposes a color attribute compression method for MPEG Point Cloud Compression (PCC) by exploiting the spatial redundancy among the adjacent points. With the increased interest in representing real-world surface as 3D point clouds, compressing the attributes (i.e., colors and normal directions) of point cloud has attracted great attention in MPEG. The proposed method is based on grouping the adjacent points in blocks. And two encoding modes are supported for each block, which include the run-length encoding mode and palette mode. The final encoding mode for each block is determined through comparing two distortion values based on two encoding modes. Experimental results show that the proposed approach achieves about 28 percent compression ratio than that of MPEG PCC.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"197 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115715261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Weakly structured information aggregation for upper-body posture assessment using ConvNets 基于卷积神经网络的上半身姿态评估弱结构信息聚合
2017 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2017-07-01 DOI: 10.1109/ICME.2017.8019410
Zewei Ding, W. Li, Pichao Wang, P. Ogunbona, Ling Qin
{"title":"Weakly structured information aggregation for upper-body posture assessment using ConvNets","authors":"Zewei Ding, W. Li, Pichao Wang, P. Ogunbona, Ling Qin","doi":"10.1109/ICME.2017.8019410","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019410","url":null,"abstract":"Posture assessment aims to determine the risk associated with poor posture and thus avoid injury in subjects. Upper-body posture assessment from images offers an attractive alternative to manual methods by directly extracting relevant features for classification. A deep convolutional neural network is proposed to extract structured features from different body parts and learn shared features that are used to determine the appropriate assessment. The structured features are learned with triplet-based rank constraints based on head and torso separately. The shared feature and assessment function are learned with soft-max constraints based on posture risk measurements. Experimental evaluation on a self-collected upper-body posture dataset has verified the efficacy of the proposed method and network architecture.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114699866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Restoration of sea surface temperature images by learning-based and optical-flow-based inpainting 基于学习和光流的海面温度图像修复
2017 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2017-07-01 DOI: 10.1109/ICME.2017.8019401
S. Shibata, M. Iiyama, Atsushi Hashimoto, M. Minoh
{"title":"Restoration of sea surface temperature images by learning-based and optical-flow-based inpainting","authors":"S. Shibata, M. Iiyama, Atsushi Hashimoto, M. Minoh","doi":"10.1109/ICME.2017.8019401","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019401","url":null,"abstract":"Sea surface temperature (SST) images taken from satellites are partially occluded by clouds. In this paper, we propose an inpainting approach for restoration of the partially occluded images. Assuming the sparseness of the SST images, we employ a learning based inpainting for filling the occluded parts. Images taken in the past several days is another clue for filling the occluded parts. These images are regarded as time series data and a video inpainting method is also available. We employ PCA-based inpainting as a learning-based approach and optical-flow-based inpainting as video inpainting, and combine the two restored images according to the expected their restoration error. Experimental results with real satellite images show the effectiveness of our method.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"207 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115041487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Fashion analysis with a subordinate attribute classification network 基于从属属性分类网络的时尚分析
2017 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2017-07-01 DOI: 10.1109/ICME.2017.8019354
Huijing Zhan, Boxin Shi, A. Kot
{"title":"Fashion analysis with a subordinate attribute classification network","authors":"Huijing Zhan, Boxin Shi, A. Kot","doi":"10.1109/ICME.2017.8019354","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019354","url":null,"abstract":"In this paper we deal with two image-based object search tasks in the fashion domain, clothing attribute prediction and cross-domain shoe retrieval. Clothing attribute prediction is about describing the appearances of clothes via semantic attributes and cross-domain shoe retrieval aims at retrieving the same shoe items from online stores given a daily life shoe photo. We jointly solve these two problems by a novel Subordinate Attribute Convolutional Neural Network (SA-CNN), with the newly designed loss function that systematically merges semantic attributes of closer visual appearance to prevent images with obvious visual differences being confused with each other. A three-level feature representation is further developed based on SA-CNN for shoes from different domains. The experimental results demonstrate that the clothing attribute prediction using the proposed SA-CNN achieves better performance than that using traditional features and fine-tuned conventional CNN. Moreover, for the task of cross-domain shoe retrieval, the top-20 retrieval accuracy with deep features extracted from SA-CNN has a significant improvement of 43% compared to that with the pretrained CNN features.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123911507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信