Hengduo Li, Jun Liu, Guyue Zhang, Yuan Gao, Yirui Wu
{"title":"Multi-glimpse LSTM with color-depth feature fusion for human detection","authors":"Hengduo Li, Jun Liu, Guyue Zhang, Yuan Gao, Yirui Wu","doi":"10.1109/ICIP.2017.8296412","DOIUrl":"https://doi.org/10.1109/ICIP.2017.8296412","url":null,"abstract":"With the development of depth cameras such as Kinect and Intel Realsense, RGB-D based human detection receives continuous research attention due to its usage in a variety of applications. In this paper, we propose a new Multi-Glimpse LSTM (MG-LSTM) network, in which multi-scale contextual information is sequentially integrated to promote the human detection performance. Furthermore, we propose a feature fusion strategy based on our MG-LSTM network to better incorporate the RGB and depth information. To the best of our knowledge, this is the first attempt to utilize LSTM structure for RGB-D based human detection. Our method achieves superior performance on two publicly available datasets.","PeriodicalId":229602,"journal":{"name":"2017 IEEE International Conference on Image Processing (ICIP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131840712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. Yun, Young-Gyu Kim, Yeongmin Lee, Jinyeon Lim, Wonseok Choi, Muhammad Umar Karim Khan, Asim Khan, Said Homidov, Pervaiz Kareem, Hyun Sang Park, C. Kyung
{"title":"Offset aperture based hardware architecture for real-time depth extraction","authors":"W. Yun, Young-Gyu Kim, Yeongmin Lee, Jinyeon Lim, Wonseok Choi, Muhammad Umar Karim Khan, Asim Khan, Said Homidov, Pervaiz Kareem, Hyun Sang Park, C. Kyung","doi":"10.1109/ICIP.2017.8297112","DOIUrl":"https://doi.org/10.1109/ICIP.2017.8297112","url":null,"abstract":"Due to the increasing demand for 3D applications, development of novel depth-sensing cameras is being actively pursued. However, most of these cameras still face the challenge of high energy consumption and slow speed in the depth extraction process. This becomes a serious bottleneck in embedded implementations where real-time performance is required, constrained by power and area. This work proposes Offset Aperture (OA) camera, a new hardware architecture for fast, low-energy, and low-complexity depth extraction. Optimal implementations of pre-processing, cost-volume generation and cost-aggregation are presented. The whole depth-extraction pipeline has been implemented on a Field Programmable Gate Array (FPGA). Overall, a mere 2.8% of bad classification was achieved with the proposed system. Also, the proposed system can process 37 VGA frames per second while consuming 0.224 μJ/pixel. High accuracy, speed and low energy consumption of the proposed OA architecture make it suitable for embedded applications.","PeriodicalId":229602,"journal":{"name":"2017 IEEE International Conference on Image Processing (ICIP)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132983104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Greedy deep transform learning","authors":"Jyoti Maggu, A. Majumdar","doi":"10.1109/ICIP.2017.8296596","DOIUrl":"https://doi.org/10.1109/ICIP.2017.8296596","url":null,"abstract":"We introduce deep transform learning — a new tool for deep learning. Deeper representation is learnt by stacking one transform after another. The learning proceeds in a greedy way. The first layer learns the transform and features from the input training samples. Subsequent layers use the features (after activation) from the previous layers as training input. Experiments have been carried out with other deep representation learning tools — deep dictionary learning, stacked denoising autoencoder, deep belief network and PCANet (a version of convolutional neural network). Results show that our proposed technique is better than all the said techniques, at least on the benchmark datasets (MNIST, CIFAR-10 and SVHN) compared on.","PeriodicalId":229602,"journal":{"name":"2017 IEEE International Conference on Image Processing (ICIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129660208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Convolutional factor analysis inspired compressive sensing","authors":"Xin Yuan, Yunchen Pu","doi":"10.1109/ICIP.2017.8296341","DOIUrl":"https://doi.org/10.1109/ICIP.2017.8296341","url":null,"abstract":"We solve the compressive sensing problem via convolutional factor analysis, where the convolutional dictionaries are learned in situ from the compressed measurements. An alternating direction method of multipliers (ADMM) paradigm for compressive sensing inversion based on convolutional factor analysis is developed. The proposed algorithm provides reconstructed images as well as features, which can be directly used for recognition (e.g., classification) tasks. We demonstrate that using ∼ 30% (relative to pixel numbers) compressed measurements, the proposed model achieves the classification accuracy comparable to the original data on MNIST.","PeriodicalId":229602,"journal":{"name":"2017 IEEE International Conference on Image Processing (ICIP)","volume":"723 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115505707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Occlusion detection in dense stereo estimation with convex optimization","authors":"Pauline Tan, A. Chambolle, P. Monasse","doi":"10.1109/ICIP.2017.8296741","DOIUrl":"https://doi.org/10.1109/ICIP.2017.8296741","url":null,"abstract":"In this paper, we propose a dense two-frame stereo algorithm which handles occlusion in a variational framework. Our method is based on a new regularization model which includes both a constraint on the occlusion width and a visibility constraint in nonoccluded areas. The minimization of the resulting energy functional is done by convex relaxation. A post-processing then detects and fills the occluded regions. We also propose a novel dissimilarity measure that combines color and gradient comparison with a variable respective weight, to benefit from the robustness of the comparison based on local variations while avoiding the fattening effect it may generate.","PeriodicalId":229602,"journal":{"name":"2017 IEEE International Conference on Image Processing (ICIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130403981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emmanuel Maggiori, Y. Tarabalka, G. Charpiat, P. Alliez
{"title":"Polygonization of remote sensing classification maps by mesh approximation","authors":"Emmanuel Maggiori, Y. Tarabalka, G. Charpiat, P. Alliez","doi":"10.1109/ICIP.2017.8296343","DOIUrl":"https://doi.org/10.1109/ICIP.2017.8296343","url":null,"abstract":"The ultimate goal of land mapping from remote sensing image classification is to produce polygonal representations of Earth's objects, to be included in geographic information systems. This is most commonly performed by running a pixelwise image classifier and then polygonizing the connected components in the classification map. We here propose a novel polygonization algorithm, which uses a labeled triangular mesh to approximate the input classification maps. The mesh is optimized in terms of an l1 norm with respect to the classifiers's output. We use a rich set of optimization operators, which includes a vertex relocator, and add a topology preservation strategy. The method outperforms current approaches, yielding better accuracy with fewer vertices.","PeriodicalId":229602,"journal":{"name":"2017 IEEE International Conference on Image Processing (ICIP)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123908831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Superpixel-based color transfer","authors":"Rémi Giraud, Vinh-Thong Ta, N. Papadakis","doi":"10.1109/ICIP.2017.8296371","DOIUrl":"https://doi.org/10.1109/ICIP.2017.8296371","url":null,"abstract":"In this work, we propose a fast superpixel-based color transfer method (SCT) between two images. Superpixels enable to decrease the image dimension and to extract a reduced set of color candidates. We propose to use a fast approximate nearest neighbor matching algorithm in which we enforce the match diversity by limiting the selection of the same superpixels. A fusion framework is designed to transfer the matched colors, and we demonstrate the improvement obtained over exact matching results. Finally, we show that SCT is visually competitive compared to state-of-the-art methods.","PeriodicalId":229602,"journal":{"name":"2017 IEEE International Conference on Image Processing (ICIP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130837579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tsung-Yu Tsai, Yen-Yu Lin, H. Liao, Shyh-Kang Jeng
{"title":"Recognizing offensive tactics in broadcast basketball videos via key player detection","authors":"Tsung-Yu Tsai, Yen-Yu Lin, H. Liao, Shyh-Kang Jeng","doi":"10.1109/ICIP.2017.8296407","DOIUrl":"https://doi.org/10.1109/ICIP.2017.8296407","url":null,"abstract":"We address offensive tactic recognition in broadcast basketball videos. As a crucial component towards basketball video content understanding, tactic recognition is quite challenging because it involves multiple independent players, each of which has respective spatial and temporal variations. Motivated by the observation that most intra-class variations are caused by non-key players, we present an approach that integrates key player detection into tactic recognition. To save the annotation cost, our approach can work on training data with only video-level tactic annotation, instead of key players labeling. Specifically, this task is formulated as an MIL (multiple instance learning) problem where a video is treated as a bag with its instances corresponding to subsets of the five players. We also propose a representation to encode the spatio-temporal interaction among multiple players. It turns out that our approach not only effectively recognizes the tactics but also precisely detects the key players.","PeriodicalId":229602,"journal":{"name":"2017 IEEE International Conference on Image Processing (ICIP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133795658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust shape regularity criteria for superpixel evaluation","authors":"Rémi Giraud, Vinh-Thong Ta, N. Papadakis","doi":"10.1109/ICIP.2017.8296924","DOIUrl":"https://doi.org/10.1109/ICIP.2017.8296924","url":null,"abstract":"Regular decompositions are necessary for most superpixel-based object recognition or tracking applications. So far in the literature, the regularity or compactness of a superpixel shape is mainly measured by its circularity. In this work, we first demonstrate that such measure is not adapted for super-pixel evaluation, since it does not directly express regularity but circular appearance. Then, we propose a new metric that considers several shape regularity aspects: convexity, balanced repartition, and contour smoothness. Finally, we demonstrate that our measure is robust to scale and noise and enables to more relevantly compare superpixel methods.","PeriodicalId":229602,"journal":{"name":"2017 IEEE International Conference on Image Processing (ICIP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115210310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Su, Mira Rizkallah, Thomas Maugey, C. Guillemot
{"title":"Graph-based light fields representation and coding using geometry information","authors":"Xin Su, Mira Rizkallah, Thomas Maugey, C. Guillemot","doi":"10.1109/ICIP.2017.8297038","DOIUrl":"https://doi.org/10.1109/ICIP.2017.8297038","url":null,"abstract":"This paper describes a graph-based coding scheme for light fields (LF). It first adapts graph-based representations (GBR) to describe color and geometry information of LF. Graph connections describing scene geometry capture inter-view dependencies. They are used as the support of a weighted Graph Fourier Transform (wGFT) to encode disoccluded pixels. The quality of the LF reconstructed from the graph is enhanced by adding extra color information to the representation for a sub-set of sub-aperture images. Experiments show that the proposed scheme yields rate-distortion gains compared with HEVC based compression (directly compressing the LF as a video sequence by HEVC).","PeriodicalId":229602,"journal":{"name":"2017 IEEE International Conference on Image Processing (ICIP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123548418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}