{"title":"An object based graph representation for video comparison","authors":"Xin Feng, Yuanyi Xue, Yao Wang","doi":"10.1109/ICIP.2017.8296742","DOIUrl":"https://doi.org/10.1109/ICIP.2017.8296742","url":null,"abstract":"This paper develops a novel object based graph model for semantic video comparison. The model describes a video with detected objects as nodes, and relationship between the objects as edges in a graph. We investigated several spatial and temporal features as the graph node attributes, and different ways to describe the spatial-temporal relationship between objects as the edge attributes. To tackle the problem of erratic camera motion on the detected object, a global motion estimation and correction approach is proposed to reveal the true object trajectory. We further propose to evaluate the similarity between two videos by establishing the object correspondence between two object graphs through graph matching. The model is verified on a challenging user generated video dataset. Experiments show that our method outperforms other video representation frameworks in matching videos with the same semantic content. The proposed object graph provides a compact and robust semantic descriptor for a video, which can be used for applications such as video retrieval, clustering and summarization. The graph representation is also flexible to incorporate other features as node and edge attributes.","PeriodicalId":229602,"journal":{"name":"2017 IEEE International Conference on Image Processing (ICIP)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130292437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Complex coefficient representation for IIR bilateral filter","authors":"Norishige Fukushima, Kenjiro Sugimoto, S. Kamata","doi":"10.1109/ICIP.2017.8296724","DOIUrl":"https://doi.org/10.1109/ICIP.2017.8296724","url":null,"abstract":"In this paper, we propose an infinite impulse response (IIR) filtering with complex coefficients for Euclid distance based filtering, e.g. bilateral filtering. Recursive filtering of edge-preserving filtering is the most efficient filtering. Recursive bilateral filtering and domain transform filtering belong to this type. These filters measure the difference between pixel intensities by geodesic distance. Also, these filters do not have separability. The aspects make the filter sensitive to noises. Bilateral filtering does not have these issues, but it is time-consuming. In this paper, edge-preserving filtering with the complex exponential function is proposed. The resulting stack of these IIR filtering is merged to approximated edge-preserving in FIR filtering, which includes bilateral filtering. For bilateral filtering, a raised-cosine function is used for efficient approximation. The experimental results show that the proposed filter, named IIR bilateral filter, approximates well and the computational cost is low.","PeriodicalId":229602,"journal":{"name":"2017 IEEE International Conference on Image Processing (ICIP)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126217732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lingchuan Sun, Yun Zhou, Zhuqing Jiang, Aidong Men
{"title":"Coupled analysis-synthesis dictionary learning for person re-identification","authors":"Lingchuan Sun, Yun Zhou, Zhuqing Jiang, Aidong Men","doi":"10.1109/ICIP.2017.8296304","DOIUrl":"https://doi.org/10.1109/ICIP.2017.8296304","url":null,"abstract":"In this paper, we propose a novel coupled dictionary learning method, namely coupled analysis-synthesis dictionary learning, to improve the performance of person re-identification in the non-overlapping fields of different camera views. Most of the existing coupled dictionary learning methods train a coupled synthesis dictionary directly on the original feature spaces, which limits the representation ability of the dictionary. To handle the diversities of different original spaces, We first employ local Fisher discriminant analysis (LFDA) to learn a common feature space for close relationship of the same people in different views. In order to enhance the representation power of the coupled synthesis dictionary, we then learn a coupled analysis dictionary by transforming the common feature space into the coupled feature space. Experimental results on two publicly available VIPeR and CUHK01 datasets have validated the effectiveness of the proposed method.","PeriodicalId":229602,"journal":{"name":"2017 IEEE International Conference on Image Processing (ICIP)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126337266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3D Mesh coding with predefined region-of-interest","authors":"Jonas El Sayeh Khalil, A. Munteanu, P. Lambert","doi":"10.1109/ICIP.2017.8296516","DOIUrl":"https://doi.org/10.1109/ICIP.2017.8296516","url":null,"abstract":"We introduce a novel functionality for wavelet-based irregular mesh codecs which allows for prioritizing at the encoding side a region-of-interest (ROI) over a background (BG), and for transmitting the encoded data such that the quality in these regions increases first. This is made possible by appropriately scaling wavelet coefficients. To improve the decoded geometry in the BG, we propose an ROI-aware inverse wavelet transform which only upscales the connectivity in the required regions. Results show clear bitrate and vertex savings. For a trivial front-back selection of the ROI and BG, rendering from the front saves up to 5 bits per vertex and up to 50% of the geometry, while appearing visually lossless.","PeriodicalId":229602,"journal":{"name":"2017 IEEE International Conference on Image Processing (ICIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121034816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A hierarchical feature model for multi-target tracking","authors":"M. Ullah, A. Mohammed, F. A. Cheikh, Zhaohui Wang","doi":"10.1109/ICIP.2017.8296755","DOIUrl":"https://doi.org/10.1109/ICIP.2017.8296755","url":null,"abstract":"We propose a novel Hierarchical Feature Model (HFM) for multi-target tracking. The traditional tracking algorithms use handcrafted features that cannot track targets accurately when the target model changes due to articulation, illumination intensity variation or perspective distortions. Our HFM explore deep features to model the appearance of targets. Then, we use an unsupervised dimensionality reduction for sparse representation of the feature vectors to cope with the time-critical nature of the tracking problem. Subsequently, a Bayesian filter is adopted as the tracker and a discrete combinatorial optimization is considered for target association. We compare our proposed HFM against 4 state-of-the-art trackers using 4 benchmark datasets. The experimental results show that our HFM outperforms all the state-of-the-art methods in terms of both Multi Object Tracking Accuracy (MOTA) and Multi Object Tracking Precision (MOTP).","PeriodicalId":229602,"journal":{"name":"2017 IEEE International Conference on Image Processing (ICIP)","volume":"279 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121820506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Content adaptive video summarization using spatio-temporal features","authors":"Hyunwoo Nam, C. Yoo","doi":"10.1109/ICIP.2017.8297034","DOIUrl":"https://doi.org/10.1109/ICIP.2017.8297034","url":null,"abstract":"This paper proposes a video summarization method based on novel spatio-temporal features that combine motion magnitude, object class prediction, and saturation. Motion magnitude measures how much motion there is in a video. Object class prediction provides information about an object in a video. Saturation measures the colorfulness of a video. Con-volutional neural networks (CNNs) are incorporated for object class prediction. The sum of the normalized features per shot are ranked in descending order, and the summary is determined by the highest ranking shots. This ranking can be conditioned on the object class, and the high-ranking shots for different object classes are also proposed as a summary of the input video. The performance of the summarization method is evaluated on the SumMe datasets, and the results reveal that the proposed method achieves better performance than the summary of worst human and most other state-of-the-art video summarization methods.","PeriodicalId":229602,"journal":{"name":"2017 IEEE International Conference on Image Processing (ICIP)","volume":"354 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114373978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep CNN with color lines model for unmarked road segmentation","authors":"Shashank Yadav, Suvam Patra, Chetan Arora, Subhashis Banerjee","doi":"10.1109/ICIP.2017.8296348","DOIUrl":"https://doi.org/10.1109/ICIP.2017.8296348","url":null,"abstract":"Road detection from a monocular camera is an important perception module in any advanced driver assistance or autonomous driving system. Traditional techniques [1, 2, 3, 4, 5, 6] work reasonably well for this problem, when the roads are well maintained and the boundaries are clearly marked. However, in many developing countries or even for the rural areas in the developed countries, the assumption does not hold which leads to failure of such techniques. In this paper we propose a novel technique based on the combination of deep convolutional neural networks (CNNs), along with color lines model [7] based prior in a conditional random field (CRF) framework. While the CNN learns the road texture, the color lines model allows to adapt to varying illumination conditions. We show that our technique outperforms the state of the art segmentation techniques on the unmarked road segmentation problem. Though, not a focus of this paper, we show that even on the standard benchmark datasets like KITTI [8] and CamVid [9], where the road boundaries are well marked, the proposed technique performs competitively to the contemporary techniques.","PeriodicalId":229602,"journal":{"name":"2017 IEEE International Conference on Image Processing (ICIP)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115763165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kun Yuan, Gaofeng Meng, D. Cheng, Jun Bai, Shiming Xiang, Chunhong Pan
{"title":"Efficient cloud detection in remote sensing images using edge-aware segmentation network and easy-to-hard training strategy","authors":"Kun Yuan, Gaofeng Meng, D. Cheng, Jun Bai, Shiming Xiang, Chunhong Pan","doi":"10.1109/ICIP.2017.8296243","DOIUrl":"https://doi.org/10.1109/ICIP.2017.8296243","url":null,"abstract":"Detecting cloud regions in remote sensing image (RSI) is very challenging yet of great importance to meteorological forecasting and other RSI-related applications. Technically, this task is typically implemented as a pixel-level segmentation. However, traditional methods based on handcrafted or low-level cloud features often fail to achieve satisfactory performances from images with bright non-cloud and/or semitransparent cloud regions. What is more, the performances could be further degraded due to the ambiguous boundaries caused by complicated textures and non-uniform distribution of intensities. In this paper, we propose a multi-task based deep neural network for cloud detection in RSIs. Architecturally, our network is designed to combine the two tasks of cloud segmentation and cloud edge detection together to encourage a better detection near cloud boundaries, resulting in an end-to-end approach for accurate cloud detection. Accordingly, an efficient sample selection strategy is proposed to train our network in an easy-to-hard manner, in which the number of the selected samples is governed by a weight that is annealed until the entire training samples have been considered. Both visual and quantitative comparisons are conducted on RSIs collected from Google Earth. The experimental results indicate that our method can yield superior performance over the state-of-the-art methods.","PeriodicalId":229602,"journal":{"name":"2017 IEEE International Conference on Image Processing (ICIP)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123893505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nils Genser, Jürgen Seiler, Markus Jonscher, André Kaup
{"title":"Demonstration of rapid frequency selective reconstruction for image resolution enhancement","authors":"Nils Genser, Jürgen Seiler, Markus Jonscher, André Kaup","doi":"10.1109/ICIP.2017.8297158","DOIUrl":"https://doi.org/10.1109/ICIP.2017.8297158","url":null,"abstract":"Most algorithms for processing, transmitting, or displaying images require the samples being placed on a regular grid. However, there exist imaging scenarios where the samples to be processed are located on a mesh with non-integer positions. This happens for example in image super-resolution, image warping, image registration, or image acquisition using random sampling sensors. In addition, sampling an image at non-regular sensor positions offers several advantages, as the effective spatial resolution of an imaging sensor can be increased as shown in [1].","PeriodicalId":229602,"journal":{"name":"2017 IEEE International Conference on Image Processing (ICIP)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128884332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tingzhao Yu, Huxiang Gu, Lingfeng Wang, Shiming Xiang, Chunhong Pan
{"title":"Cascaded temporal spatial features for video action recognition","authors":"Tingzhao Yu, Huxiang Gu, Lingfeng Wang, Shiming Xiang, Chunhong Pan","doi":"10.1109/ICIP.2017.8296542","DOIUrl":"https://doi.org/10.1109/ICIP.2017.8296542","url":null,"abstract":"Extracting spatial-temporal descriptors is a challenging task for video-based human action recognition. We decouple the 3D volume of video frames directly into a cascaded temporal spatial domain via a new convolutional architecture. The motivation behind this design is to achieve deep nonlinear feature representations with reduced network parameters. First, a 1D temporal network with shared parameters is first constructed to map the video sequences along the time axis into feature maps in temporal domain. These feature maps are then organized into channels like those of RGB image (named as Motion Image here for abbreviation), which is desired to preserve both temporal and spatial information. Second, the Motion Image is regarded as the input of the latter cascaded 2D spatial network. With the combination of the 1D temporal network and the 2D spatial network together, the size of whole network parameters is largely reduced. Benefiting from the Motion Image, our network is an end-to-end system for the task of action recognition, which can be trained with the classical algorithm of back propagation. Quantities of comparative experiments on two benchmark datasets demonstrate the effectiveness of our new architecture.","PeriodicalId":229602,"journal":{"name":"2017 IEEE International Conference on Image Processing (ICIP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127524294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}