{"title":"A Comparison between Anatomy-Based and Data-Driven Tree Models for Human Pose Estimation","authors":"H. Vu, Richardt H. Wilkinson, M. Lech, E. Cheng","doi":"10.1109/DICTA.2017.8227386","DOIUrl":"https://doi.org/10.1109/DICTA.2017.8227386","url":null,"abstract":"Tree structures are commonly used to model relationships between body parts for articulated Human Pose Estimation (HPE). Tree structures can be used to model relationships among feature maps of joints in a structured learning framework using Convolutional Neural Networks (CNNs). This paper proposes new data-driven tree models for HPE. The data-driven tree structures were obtained using the Chow-Liu Recursive Grouping (CLRG) algorithm, representing the joint distribution of human body joints and tested using the Leeds Sports Pose (LSP) dataset. The paper analyzes the effect of the variation of the number of nodes on the accuracy of the HPE. Experimental results showed that the data-driven tree model obtained 1% higher HPE accuracy compared to the traditional anatomy-based model. A further improvement of 0.5% was obtained by optimizing the number of nodes in the traditional anatomy-based model.","PeriodicalId":194175,"journal":{"name":"2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"407 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127600481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abdullah N. Moustafa, Mohamed E. Hussein, W. Gomaa
{"title":"Gate and Common Pathway Detection in Crowd Scenes Using Motion Units and Meta-Tracking","authors":"Abdullah N. Moustafa, Mohamed E. Hussein, W. Gomaa","doi":"10.1109/DICTA.2017.8227438","DOIUrl":"https://doi.org/10.1109/DICTA.2017.8227438","url":null,"abstract":"This paper proposes a new approach for analysing crowded video scenes. The proposed approach decomposes the scene motion dynamics into a graph of interconnected atomic elements of coherent motions named Motion Units (MUs). Different MUs cover scene's local regions with different size and shape, which can even overlap. MUs relationships are analysed to discover the scene entrances and exits. Dominant motion pathways are then discovered by meta-tracking of particles injected at the scene entrances and driven through MUs using their linear dynamical systems until reaching scene exits. A prototype is developed such that; MUs are constructed by tracklet clustering, MU's motion pattern is represented by a linear model, and the MUs relationships are defined by the continuation likelihood among their mean tracklets. The prototype was evaluated on the challenging New York Grand Central Station scene, as well as other crowded scenes, and it managed to outperform the state of the art approaches.","PeriodicalId":194175,"journal":{"name":"2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126325112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Orientation-Context Descriptor and Locality-Preserving Fisher Discrimination Dictionary Learning for Action Recognition","authors":"Renlong Pan, Lihong Ma, Yupeng Zhan, S. Cai","doi":"10.1109/DICTA.2017.8227395","DOIUrl":"https://doi.org/10.1109/DICTA.2017.8227395","url":null,"abstract":"This paper presents a novel local posture orientation-context descriptor, and proposes a FDDL(Fisher discriminant dictionary learning) method based on local orientation-preserving(LOP-FDDL) for sparse coding in action recognition task. To take full use of the information about the position of the local body-part related to the center of the torso, ant the spatial-temporal shape changes of the human body-parts, we extract orientation-context descriptors of local body-parts to express the local posture of human body. Our descriptors not only include orientation information, but and also include the information of geometric structure and motion of body-parts. In order to accurately express action sequences, we need to learn a discriminative dictionary with strong expressive power which consists of the information about categories and orientations of body-parts from the extracted posture descriptors. Hence, a discriminative dictionary learning model based on the manifold constraint of local orientation-preserving is proposed, and Fisher Criteria is considered in the sparse coding stage of this model, which makes the coding coefficients discriminative. Meanwhile, to improve the performance of dictionary and learning efficiency, we initialize the dictionary as a class-structured dictionary which is a block-structured dictionary with orientation information. Experimental results demonstrate that our proposed method is better than other related action recognition methods on Weizmann and KTH public datasets.","PeriodicalId":194175,"journal":{"name":"2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132557851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Alam, J. Zhou, Lei Tong, Alan Wee-Chung Liew, Yongsheng Gao
{"title":"Combining Unmixing and Deep Feature Learning for Hyperspectral Image Classification","authors":"F. Alam, J. Zhou, Lei Tong, Alan Wee-Chung Liew, Yongsheng Gao","doi":"10.1109/DICTA.2017.8227419","DOIUrl":"https://doi.org/10.1109/DICTA.2017.8227419","url":null,"abstract":"Image classification is one of the critical tasks in hyperspectral remote sensing. In recent years, significant improvement have been achieved by various classification methods. However, mixed spectral responses from different ground materials still create confusions in complex scenes. In this regard, unmixing approaches are being successfully carried out to decompose mixed pixels into a collection of spectral signatures. Considering the usefulness of these techniques, we propose to utilize the unmixing results as an input to classifiers for better classification accuracy. We propose a novel band group based structure preserving nonnegative matrix factorization (NMF) method to estimate the individual spectral responses from different materials within different ranges of wavelengths. Then we train a convolutional neural network (CNN) with the unmixing results to generate powerful features and eventually classify the data. This method is evaluated on a new dataset and compared with several state-of-the-art models, which shows the promising potential of our method.","PeriodicalId":194175,"journal":{"name":"2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134421716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Fully-Convolutional Framework for Semantic Segmentation","authors":"Yalong Jiang, Z. Chi","doi":"10.1109/DICTA.2017.8227388","DOIUrl":"https://doi.org/10.1109/DICTA.2017.8227388","url":null,"abstract":"In this paper we propose a deep learning technique to improve the performance of semantic segmentation tasks. Previously proposed algorithms generally suffer from the over-dependence on a single modality as well as a lack of training data. We made three contributions to improve the performance. Firstly, we adopt two models which are complementary in our framework to enrich field-of-views and features to make segmentation more reliable. Secondly, we repurpose the datasets form other tasks to the segmentation task by training the two models in our framework on different datasets. This brings the benefits of data augmentation while saving the cost of image annotation. Thirdly, the number of parameters in our framework is minimized to reduce the complexity of the framework and to avoid over- fitting. Experimental results show that our framework significantly outperforms the current state-of-the-art methods with a smaller number of parameters and better generalization ability.","PeriodicalId":194175,"journal":{"name":"2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"40 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131532634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ashek Ahmmed, Afrin Rahman, M. Pickering, A. Naman
{"title":"4K Ultra High Definition Video Coding Using Homogeneous Motion Discovery Oriented Prediction","authors":"Ashek Ahmmed, Afrin Rahman, M. Pickering, A. Naman","doi":"10.1109/DICTA.2017.8227385","DOIUrl":"https://doi.org/10.1109/DICTA.2017.8227385","url":null,"abstract":"State of the art video compression techniques use the motion model to approximate geometric boundaries of moving objects where motion discontinuities occur. Motion hints based inter- frame prediction paradigm moves away from this redundant approach and employs an innovative framework consisting of motion hint fields that are continuous and invertible, at least, over their respective domains. However, estimation of motion hint is computationally demanding, in particular for high resolution video sequences. Discovery of homogeneous motion models and their associated masks over the current frame and then use these models and masks to form a prediction of the current frame, provides a computationally simpler approach to video coding compared to motion hint. In this paper, the potential of this coherent motion model based approach, equipped with bigger blocks, is investigated for coding 4K Ultra High Definition (UHD) video sequences. Experimental results show a savings in bit rate of 4.68% is achievable over standalone HEVC.","PeriodicalId":194175,"journal":{"name":"2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131789374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Local Features Augmenting for Better Image Retrieval","authors":"Long Zhao, Yu Wang, Jien Kato","doi":"10.1109/DICTA.2017.8227461","DOIUrl":"https://doi.org/10.1109/DICTA.2017.8227461","url":null,"abstract":"Recently, a lot of works have shown the advantages of utilizing the deep descriptors, obtained from the features of the last convolution layer in CNNs, on image retrieval. In this paper, we focus on augmenting and fusing CNN features for the image retrieval task. We first investigate the effects of network rotation, and then propose two models for deep feature augmenting: single model augmenting and multiple model augmenting. For the single model augmenting, we expand the model by rotating and flipping the single network. While for the multiple model, we expand filters by connecting the different networks together. As to the fusion methods, we evaluate concatenation, average and max pooling. We conduct a thorough evaluation of the above models and fusion approaches, and show the state of the art performance of our proposed approach.","PeriodicalId":194175,"journal":{"name":"2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134503312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Tracking via Spatio-Temporally Weighted Multiple Instance Learning","authors":"Li Wang, Xiao'an Tang, Dongdong Li","doi":"10.1109/DICTA.2017.8227488","DOIUrl":"https://doi.org/10.1109/DICTA.2017.8227488","url":null,"abstract":"Due to the superiority in handling label ambiguity, multiple instance learning (MIL) has been introduced into adaptive tracking-by-detection methods to alleviate drift and yields promising tracking performance. However, the MIL tracker assumes that all samples in a positive bag contribute equally to the bag probability, which ignores sample importance. To address this issue, in this paper we propose a spatio- temporally weighted MIL (STWMIL) tracker which integrates temporal weight into the update scheme for Haar-like features and spatial weight into the bag probability function. Spatial weight for the positive sample near the target location is larger than that far from the target location, which means the former contributes more to the positive bag probability. Based on spatial weight, a novel bag probability function is proposed using the weighted Noisy-OR model. Temporal weight for the recently-acquired images is larger than that for the earlier observations, which means less modeling power is expended on old observations. Based on temporal weight, a novel update scheme with changing but convergent learning rate is derived with strict mathematic proof. Extensive experiments performed on the OTB-2013 tracking benchmark demonstrate that our proposed tracker achieves superior performance both qualitatively and quantitatively over several state-of- the-art trackers.","PeriodicalId":194175,"journal":{"name":"2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115449088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Virtual View Quality Enhancement Technique through a Learning of Synthesised Video","authors":"D. M. Rahaman, M. Paul","doi":"10.1109/DICTA.2017.8227397","DOIUrl":"https://doi.org/10.1109/DICTA.2017.8227397","url":null,"abstract":"With the development of displaying techniques, free viewpoint video (FVV) system shows its potential to provide immersive perceptual feeling by changing viewpoints. To provide this luxury, a large number of high quality views have to be synthesised from limited number of viewpoints. However, in this process, a portion of the background is occluded by the foreground object in the generated synthesised videos. Recent techniques, i.e. view synthesized prediction using Gaussian model (VSPGM) and adaptive weighting between warped and learned foregrounds indicate that learning techniques may fill occluded areas almost correctly. However, these techniques use temporal correlation by assuming that original texture of the target viewpoint are already available to fill up occluded areas which is not a practical solution. Moreover, if a pixel position experiences foreground once during learning, the existing techniques considered it as foreground throughout the process. However, the actual fact is that after experiencing a foreground a pixel position can be background again. To address the aforementioned issues, in the proposed view synthesise technique, we apply Gaussian mixture modelling (GMM) on the output images of inverse mapping (IM) technique for further improving the quality of the synthesised videos. In this technique, the foreground and background pixel intensities are refined from adaptive weights of the output of inverse mapping and the pixel intensities from the corresponding model(s) of the GMM. This technique provides a better pixel correspondence, which improves 0.10~0.46dB PSNR compared to the IM technique.","PeriodicalId":194175,"journal":{"name":"2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115832281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fine-Grained Butterfly Recognition with Deep Residual Networks: A New Baseline and Benchmark","authors":"Lin Nie, Keze Wang, Xiaoling Fan, Yuefang Gao","doi":"10.1109/DICTA.2017.8227435","DOIUrl":"https://doi.org/10.1109/DICTA.2017.8227435","url":null,"abstract":"Thanks to the advances in deep learning techniques and the increasing size of training data, ground- breaking progress on image classification has recently been achieved. However, focusing on distinguishing usually hundreds of sub-categories belonging to the same basic-level category, fine- grained recognition of unusual natural object categories (e.g., a special type of insect) still remains challenging and needs to be solved. Due to mainly lack of sufficient annotated data, the state-of-the-art image classification approaches cannot well adapt to address the fine-grained challenges. In this paper, we study the problem of fine-grained butterfly recognition by introducing a new large-scale benchmark, which includes 82 butterfly categories. Moreover, we perform empirical study of the existing state-of-the-art image classification approaches and adopt ResNet as a new baseline. Extensive experiments under empirical settings demonstrate the superiority of the proposed baseline.","PeriodicalId":194175,"journal":{"name":"2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114321827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}