Xufeng Guo, S. Denman, C. Fookes, Luis Mejías Alvarez, S. Sridharan
{"title":"Automatic UAV Forced Landing Site Detection Using Machine Learning","authors":"Xufeng Guo, S. Denman, C. Fookes, Luis Mejías Alvarez, S. Sridharan","doi":"10.1109/DICTA.2014.7008097","DOIUrl":"https://doi.org/10.1109/DICTA.2014.7008097","url":null,"abstract":"The commercialization of aerial image processing is highly dependent on the platforms such as UAVs (Unmanned Aerial Vehicles). However, the lack of an automated UAV forced landing site detection system has been identified as one of the main impediments to allow UAV flight over populated areas in civilian airspace. This article proposes a UAV forced landing site detection system that is based on machine learning approaches including the Gaussian Mixture Model and the Support Vector Machine. A range of learning parameters are analysed including the number of Guassian mixtures, support vector kernels including linear, radial basis function Kernel (RBF) and polynormial kernel (poly), and the order of RBF kernel and polynormial kernel. Moreover, a modified footprint operator is employed during feature extraction to better describe the geometric characteristics of the local area surrounding a pixel. The performance of the presented system is compared to a baseline UAV forced landing site detection system which uses edge features and an Artificial Neural Network (ANN) region type classifier. Experiments conducted on aerial image datasets captured over typical urban environments reveal improved landing site detection can be achieved with an SVM classifier with an RBF kernel using a combination of colour and texture features. Compared to the baseline system, the proposed system provides significant improvement in term of the chance to detect a safe landing area, and the performance is more stable than the baseline in the presence of changes to the UAV altitude.","PeriodicalId":146695,"journal":{"name":"2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125261356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Non-Rigid 3D Multi-Modal Registration Algorithm Using Partial Volume Interpolation and the Sum of Conditional Variance","authors":"Mst. Nargis Aktar, M. Alam, M. Pickering","doi":"10.1109/DICTA.2014.7008088","DOIUrl":"https://doi.org/10.1109/DICTA.2014.7008088","url":null,"abstract":"Multi-modal medical image registration provides complementary information from the fusion of various medical imaging modalities. This paper presents a volume based multi-modal affine registration algorithm to register images acquired using different magnetic resonance imaging (MRI) modes. In the proposed algorithm, the sum-of-conditional variance (SCV) similarity measure is used. The SCV is considered to be a state-of-the- art similarity measure for registering multi-modal images. However, the main drawback of the SCV is that it uses only quantized information to calculate a joint histogram. To overcome this limitation, we propose to use partial volume interpolation (PVI) in the joint histogram calculation to improve the performance of the existing registration algorithm. To evaluate the performance of the registration algorithm, different similarity measures were compared in conjunction with gradient-based Gauss-Newton (GN) optimization to optimize the spatial transformation parameters. The experimental evaluation shows that the proposed approach provides a higher success rate and comparable accuracy to other methods that have been recently proposed for multi-modal medical image registration.","PeriodicalId":146695,"journal":{"name":"2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131715111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Fehlmann, C. Pontecorvo, D. Booth, P. Janney, Robert Christie, N. Redding, Mike Royce, Merrilyn J. Fiebig
{"title":"Fusion of Multiple Sensor Data to Recognise Moving Objects in Wide Area Motion Imagery","authors":"S. Fehlmann, C. Pontecorvo, D. Booth, P. Janney, Robert Christie, N. Redding, Mike Royce, Merrilyn J. Fiebig","doi":"10.1109/DICTA.2014.7008110","DOIUrl":"https://doi.org/10.1109/DICTA.2014.7008110","url":null,"abstract":"This work addresses the problem of extracting semantics associated with multiple, cooperatively managed motion imagery sensors to support indexing and search of large imagery collections. The extracted semantics relate to the motion and identity of vehicles within a scene, viewed from aircraft and the ground. Semantic extraction required three steps: Video Moving Target Indication (VMTI), imagery fusion, and object recognition. VMTI used a previously published algorithm, with some novel modifications allowing detection and tracking in low frame rate, Wide Area Motion Imagery (WAMI), and Full Motion Video (FMV). Following this, the data from multiple sensors were fused to identify a highest resolution image, corresponding to each moving object. A final recognition stage attempted to fit each delineated object to a database of 3D models to determine its type. A proof-of-concept has been developed to allow processing of imagery collected during a recent experiment using a state of the art airborne surveillance sensor providing WAMI, with coincident narrower-area FMV sensors and simultaneous collection by a ground-based camera. An indication of the potential utility of the system was obtained using ground-truthed examples.","PeriodicalId":146695,"journal":{"name":"2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116520849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Focus Image Fusion via Boundary Finding and Multi-Scale Morphological Focus-Measure","authors":"Yu Zhang, X. Bai, Tao Wang","doi":"10.1109/DICTA.2014.7008116","DOIUrl":"https://doi.org/10.1109/DICTA.2014.7008116","url":null,"abstract":"Multi-focus image fusion is to extract the focused regions from the multiple images of the same scene and combine them together to produce one fully focused image. The key is to find the focused regions from the source images. In this paper, we transform the problem of finding the focused regions to find the boundaries between the focused and defocused regions in the source images, and propose a novel image fusion method via boundary finding and a multi-scale morphological focus-measure. Firstly, a morphological focus-measure, consisted of multi- scale morphological gradients, is proposed to measure the focus of the images. Secondly, a novel boundary finding method is presented, which utilizes the relations of the focus information of the source images. Thirdly, the found boundaries naturally segment the source images into regions with the same focus condition and the focused regions can be simply selected by comparing the focus-measures of the corresponding regions. Fourthly, the detected focused regions are reconstructed to obtain the decision map for the multi-focus image fusion. Finally, the fused image is produced according to the decision map and the given fusion rule. Experimental results demonstrate the proposed algorithm outperforms other spatial domain fusion algorithms.","PeriodicalId":146695,"journal":{"name":"2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"180 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115036032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pallab Kanti Podder, M. Paul, Manzur Murshed, Subrata Chakraborty
{"title":"Fast Intermode Selection for HEVC Video Coding Using Phase Correlation","authors":"Pallab Kanti Podder, M. Paul, Manzur Murshed, Subrata Chakraborty","doi":"10.1109/DICTA.2014.7008109","DOIUrl":"https://doi.org/10.1109/DICTA.2014.7008109","url":null,"abstract":"The recent High Efficiency Video Coding (HEVC) Standard demonstrates higher rate-distortion (RD) performance compared to its predecessor H.264/AVC using different new tools especially larger and asymmetric inter-mode variable size motion estimation and compensation. This requires more than 4 times computational time compared to H.264/AVC. As a result it has always been a big concern for the researchers to reduce the amount of time while maintaining the standard quality of the video. The reduction of computational time by smart selection of the appropriate modes in HEVC is our motivation. To accomplish this task in this paper, we use phase correlation to approximate the motion information between current and reference blocks by comparing with a number of different binary pattern templates and then select a subset of motion estimation modes without exhaustively exploring all possible modes. The experimental results exhibit that the proposed HEVC-PC (HEVC with Phase Correlation) scheme outperforms the standard HEVC scheme in terms of computational time while preserving-the same quality of the video sequences. More specifically, around 40% encoding time is reduced compared to the exhaustive mode selection in HEVC.","PeriodicalId":146695,"journal":{"name":"2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134188018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Infrared Ship Target Image Smoothing Based on Adaptive Mean Shift","authors":"Zhaoying Liu, Changming Sun, X. Bai, F. Zhou","doi":"10.1109/DICTA.2014.7008113","DOIUrl":"https://doi.org/10.1109/DICTA.2014.7008113","url":null,"abstract":"Infrared (IR) image denoising is important for IR image analysis. In this paper, we propose a method based on adaptive range bandwidth mean shift for IR ship target image smoothing, aiming to effectively suppress noise as well as preserve important target structures. First, local image properties, including the mean value and standard deviation, are combined to build a salient region map, and a thresholding method is applied to obtain a binary mask on the target region. Then, we develop an adaptive range bandwidth mean shift method for image denoising. By associating the range bandwidth of the mean shift with local region saliency, we can adjust the bandwidth adaptively, thus to smooth the background region while preserving important target structures. Experimental results show that this method works well for IR ship target images with different backgrounds. It demonstrates superior performance for image denoising and target preserving, compared with some existing image denoising methods.","PeriodicalId":146695,"journal":{"name":"2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129477183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Muhammad Rizwan Khokher, A. Bouzerdoum, S. L. Phung
{"title":"Crowd Behavior Recognition Using Dense Trajectories","authors":"Muhammad Rizwan Khokher, A. Bouzerdoum, S. L. Phung","doi":"10.1109/DICTA.2014.7008098","DOIUrl":"https://doi.org/10.1109/DICTA.2014.7008098","url":null,"abstract":"This article presents a new method for crowd behavior recognition, using dynamic features extracted from dense trajectories. The histogram of oriented gradient and motion boundary histogram descriptors are computed at dense points along motion trajectories, and tracked using median filtering and displacement information obtained from a dense optical flow field. Then a global representation of the scene is obtained using a bag-of-words model of the extracted features. The locality-constrained linear encoding with sum pooling and L2 plus power normalization are employed in the bag-of-words model. Finally, a support vector machine classifier is trained to recognize the crowd behavior in a short video sequence. The proposed method is tested on two benchmark datasets, and its performance is compared with those of some existing methods. Experimental results show that the proposed approach can achieve a classification rate of 93.8% on PETS2009 S3 and area under the curve score of 0.985 on UMN datasets respectively.","PeriodicalId":146695,"journal":{"name":"2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"65 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128713807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dual Graph Regularized NMF for Hyperspectral Unmixing","authors":"Lei Tong, J. Zhou, Xiao Bai, Yongsheng Gao","doi":"10.1109/DICTA.2014.7008103","DOIUrl":"https://doi.org/10.1109/DICTA.2014.7008103","url":null,"abstract":"Hyperspectral unmixing is an important technique for estimating fraction of different land cover types from remote sensing imagery. In recent years, nonnegative matrix factorization (NMF) with various constraints have been introduced into hyperspectral unmixing. Among these methods, graph based constraint have been proved to be useful in capturing the latent manifold structure of the hyperspectral data in the feature space. In this paper, we propose to integrate graph-based constraints based on manifold assumption in feature spaces and consistency of spatial space to regularize the NMF method. Results on both synthetic and real data have validated the effectiveness of the proposed method.","PeriodicalId":146695,"journal":{"name":"2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128852137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image Segmentation Based on Spatially Coherent Gaussian Mixture Model","authors":"Guangpu Shao, Junbin Gao, Tianjiang Wang, Fang Liu, Yucheng Shu, Yong Yang","doi":"10.1109/DICTA.2014.7008111","DOIUrl":"https://doi.org/10.1109/DICTA.2014.7008111","url":null,"abstract":"It has been demonstrated that a finite mixture model (FMM) with Gaussian distribution is a powerful tool in modeling probability density function of image data, with wide applications in computer vision and image analysis. We propose a simple-yet-effective way to enhance robustness of finite mixture models (FMM) by incorporating local spatial constraints. It is natural to make an assumption that the label of an image pixel is influenced by that of its neighboring pixels. We use mean template to represent local spatial constraints. Our algorithm is better than other mixture models based on Markov random fields (MRF) as our method avoids inferring the posterior field distribution and choosing the temperature parameter. We use the expectation maximization (EM) algorithm to optimize all the model parameters. Besides, the proposed algorithm is fully free of empirically adjusted hyperparameters. The idea used in our method can also be adopted to other mixture models. Several experiments on synthetic and real-world images have been conducted to demonstrate effectiveness, efficiency and robustness of the proposed method.","PeriodicalId":146695,"journal":{"name":"2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116990829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Influence of Temporal Information on Human Action Recognition with Large Number of Classes","authors":"O. V. R. Murthy, Roland Göcke","doi":"10.1109/DICTA.2014.7008131","DOIUrl":"https://doi.org/10.1109/DICTA.2014.7008131","url":null,"abstract":"Human action recognition from video input has seen much interest over the last decade. In recent years, the trend is clearly towards action recognition in real-world, unconstrained conditions (i.e. not acted) with an ever growing number of action classes. Much of the work so far has used single frames or sequences of frames where each frame was treated individually. This paper investigates the contribution that temporal information can make to human action recognition in the context of a large number of action classes. The key contributions are: (i) We propose a complementary information channel to the Bag-of- Words framework that models the temporal occurrence of the local information in videos. (ii) We investigate the influence of sensible local information whose temporal occurrence is more vital than any local information. The experimental validation on action recognition datasets with the largest number of classes to date shows the effectiveness of the proposed approach.","PeriodicalId":146695,"journal":{"name":"2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121685901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}