{"title":"1-Point Rigid Motion Estimation and Segmentation with a RGB-D Camera","authors":"Samunda Perera, N. Barnes","doi":"10.1109/DICTA.2013.6691469","DOIUrl":"https://doi.org/10.1109/DICTA.2013.6691469","url":null,"abstract":"RGB-D cameras like Microsoft Kinect that provide color and dense depth images have now become commonplace. We consider the problem of estimation and segmentation of multiple rigid body motions observed by such a camera. On the basis of differential geometry of surfaces and image gradients, we present a method for completely estimating the Euclidean transformation of a rigid body by using just a single surface point correspondence. This is facilitated by two methods of removing the sign ambiguity of principal curvature directions which is the main contribution of the paper. Further, we apply state-of-the-art rotation/translation averaging techniques to achieve refined Euclidean transformation estimates and segmentation. Results using both synthetic and real RGB-D data show the validity of our approach.","PeriodicalId":231632,"journal":{"name":"2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122883154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianjia Zhang, Lei Wang, Lingqiao Liu, Luping Zhou, W. Li
{"title":"Accelerating the Divisive Information-Theoretic Clustering of Visual Words","authors":"Jianjia Zhang, Lei Wang, Lingqiao Liu, Luping Zhou, W. Li","doi":"10.1109/DICTA.2013.6691476","DOIUrl":"https://doi.org/10.1109/DICTA.2013.6691476","url":null,"abstract":"Word clustering is an effective approach in the bag- of-words model to reducing the dimensionality of high-dimensional features. In recent years, the bag- of-words model has been successfully introduced into visual recognition and significantly developed. Often, in order to adequately model the complex and diversified visual patterns, a large number of visual words are used, especially in the state-of- the-art visual recognition methods. As a result, the existing word clustering algorithms become not computationally efficient enough. They can considerably prolong the process such as model updating and parameter tuning, where word clustering needs to be repeatedly employed. In this paper, we focus on the divisive information-theoretic clustering, one of the most efficient word clustering algorithms in the field of text analysis, and accelerate its speed to better deal with a large number of visual words. We discuss the properties of its cluster membership evaluation function, KL- divergence, in both binary and multi-class classification cases and develop the accelerated versions in two different ways. Theoretical analysis shows that the proposed accelerated divisive information-theoretic clustering algorithm can handle a large number of visual words in a much more efficient manner. As demonstrated on the benchmark datasets in visual recognition, it can achieve speed-up by hundreds of times while well maintaining the clustering performance of the original algorithm.","PeriodicalId":231632,"journal":{"name":"2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124188648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Depth to Extend Randomised Hough Forests for Object Detection and Localisation","authors":"R. Palmer, G. West, T. Tan","doi":"10.1109/DICTA.2013.6691536","DOIUrl":"https://doi.org/10.1109/DICTA.2013.6691536","url":null,"abstract":"Implicit Shape Models (ISM) have been developed for object detection and localisation in 2-D (RGB) imagery and, to a lesser extent, full 3-D point clouds. Research is ongoing to extend the approach to 2-D imagery having co-registered depth (RGB- D) e.g. from stereoscopy, laser scanning, time-of-flight cameras etc.A popular implementation of the ISM is as a Randomised Forest of classifier trees representing codebooks for use in a Hough Transform voting framework. We present three extensions to the Class-Specific Hough Forest (CSHF) that utilises RGB and co- registered depth imagery acquired via stereoscopic mobile imaging. We demonstrate how depth and RGB information can be combined during training and at detection time. Rather than encoding depth as a new dimension of Hough space (which can increase vote sparsity), depth is used to modify the resulting placement and strength of votes in the original 2-D Hough space. We compare the effect of these depth-based extensions to the unmodified CSHF detection framework evaluated against a challenging new real- world dataset of urban street scenes.","PeriodicalId":231632,"journal":{"name":"2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116865567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reconstructing Polarisation Components from Unpolarised Images","authors":"Lin Gu, C. P. Huynh, A. Robles-Kelly","doi":"10.1109/DICTA.2013.6691518","DOIUrl":"https://doi.org/10.1109/DICTA.2013.6691518","url":null,"abstract":"In this paper, we develop a method for reconstruct- ing the polarisation components from unpolarised imagery. Our approach rests on a model of polarisation which accounts for reflection from rough surfaces illuminated at moderate and large angles of incidence. Departing from the microfacet structure of rough surfaces, we relate the maximal and minimal polarimetric intensities to the diffuse and specular components of an un-polarised image via the Fresnel reflection theory. This allows us to reconstruct the polarimetric components from a single unpolarised image. Thus, the model presented here provides a link between the microfacet structure and polarisation of light upon reflection from rough surfaces. We evaluate the accuracy of the reconstructed polarisation components and illustrate the utility of the method for the simulation of a polarising filter on real-world images.","PeriodicalId":231632,"journal":{"name":"2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116659191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruwan Tennakoon, A. Bab-Hadiashar, D. Suter, Z. Cao
{"title":"Robust Data Modelling Using Thin Plate Splines","authors":"Ruwan Tennakoon, A. Bab-Hadiashar, D. Suter, Z. Cao","doi":"10.1109/DICTA.2013.6691522","DOIUrl":"https://doi.org/10.1109/DICTA.2013.6691522","url":null,"abstract":"Using splines to model spatio-temporal data is one of the most common methods of data fitting used in a variety of computer vision applications. Despite its ubiquitous applications, particularly for volumetric image registration and interpolation, the existing estimation methods are still sensitive to the existence of noise and outliers. A method of robust data modelling using thin plate splines, based upon the well-known least K-th order statistical model fitting, is proposed and compared with the best available robust spline fitting techniques. Our experiments show that existing methods are not suitable for typical computer vision applications where outliers are structured (pseudo-outliers) while the proposed method performs well even when there are numerous pseudo-outliers.","PeriodicalId":231632,"journal":{"name":"2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128671314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Fehlmann, D. Booth, P. Janney, C. Pontecorvo, Peter Aquilina, T. Scoleri, N. Redding, Robert Christie
{"title":"Application of Detection and Recognition Algorithms to Persistent Wide Area Surveillance","authors":"S. Fehlmann, D. Booth, P. Janney, C. Pontecorvo, Peter Aquilina, T. Scoleri, N. Redding, Robert Christie","doi":"10.1109/DICTA.2013.6691482","DOIUrl":"https://doi.org/10.1109/DICTA.2013.6691482","url":null,"abstract":"The persistent airborne surveillance of large geographical areas is now a viable proposition. As well as providing cues to moving objects, it presents new opportunities for understanding the behaviours and motivations of people, both individually and collectively. Exploitation of these huge collections of imagery (a facet of the Big Data challenge) requires more effective tools to derive and abstract useful information to cue the analyst. This paper describes a new system which brings together a number of techniques: moving target detection; tracking; recognition and photogrammetry, to address wide area surveillance problems. We provide a first report on the demands this places on component parts and interfaces. Significantly, we adopt international interoperability standards, particularly with regard to video metadata, to constrain the solution space. We also describe new performance improvements to the video moving target indication and photogrammetry algorithms as well as analysing for the first time the performance of our integrated target model matching capability in our automated system.","PeriodicalId":231632,"journal":{"name":"2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"12397 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123232873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Visual Vocabulary Tracking Using Hierarchical Model Fusion","authors":"B. Bozorgtabar, Roland Göcke","doi":"10.1109/DICTA.2013.6691525","DOIUrl":"https://doi.org/10.1109/DICTA.2013.6691525","url":null,"abstract":"In this paper, we propose a new visual tracking approach based on the Hierarchical Model Fusion framework, which fuses two different trackers to cope with different tracking problems. We use an Incremental Multiple Principal Component Analysis tracker as our main model as well as an image patch tracker as our auxiliary model. Firstly, we randomly sample image patches within the target region obtained by the main model in the training frames for constructing a visual vocabulary using Histogram of Oriented Gradient features. Secondly, we use a supervised learning algorithm based on a Gaussian Mixture Model, which not only operates on supervised information to improve the discriminative power of the clusters, but also increases the purity of the clusters. Then, auxiliary models are initialised by obtaining confidence scores of image patches based on the similarity between candidates and codewords. In addition, an updating procedure and a result refinement scheme are included in the proposed tracking approach. Experiments on challenging video sequences demonstrate the robustness of the proposed approach to handling occlusion, pose variation and rotation.","PeriodicalId":231632,"journal":{"name":"2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130274905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chaining Convolution and Correlation in Practice: A Case Study in Visual Tracking","authors":"D. Ward, Ivan Lee, D. Kearney, S. Wong","doi":"10.1109/DICTA.2013.6691491","DOIUrl":"https://doi.org/10.1109/DICTA.2013.6691491","url":null,"abstract":"Two dimensional convolution and cross-correlation operations are used in many image processing and computer vision applications, with algorithms commonly using a number of these operations. It is well known that these operations can be performed quickly by using a FFT to reduce computational complexity. In this paper we investigate the extent that the structure of algorithms with multiple convolution and cross-correlation operations can be exploited to further reduce computational complexity. Using the CACTuS visual tracking algorithm as a case study, we demonstrate how successive convolution and correlation operations may be chained together in the Fourier domain by taking into account the growth and shift of the output. We experimentally demonstrate that our chaining technique can result in run-time reductions of up to 55% when compared to the individual FFT approach.","PeriodicalId":231632,"journal":{"name":"2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132795399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interactive Estimation of Light Source Position and Reflectance of Real Objects for Mixed-Reality Application","authors":"Masahide Kobayashi, Y. Manabe, Noriko Yata","doi":"10.1109/DICTA.2013.6691501","DOIUrl":"https://doi.org/10.1109/DICTA.2013.6691501","url":null,"abstract":"The seamless integration of real and virtual objects is required for mixed-reality applications. To achieve this goal, we should represent an effect of light reflection like shading, shadowing and inter- reflection between the real and virtual objects. To represent these effects, we have to estimate reflectance of the real objects. The reflectance can be estimated with color and geometry of the objects and light condition of the scene. To calculate at an interactive frame rates, the light sources are distributed on a surface of a dome above the scene. To estimate the reflectance more accurately, we have to calculate distance from the objects to the light source. Therefore, this paper proposes a method to estimate the distance from the objects to the light source and the reflectance of the objects at an interactive frame rates. In the proposed method, two cameras and a marker with a spherical mirror are used. We can use an RGB camera and an IR camera of Microsoft Kinect sensor as the cameras. In other words, by use of the proposed method, we can estimate the distance and reflectance by using the Kinect and the marker with the spherical mirror. In the method, intersection points of reflection vectors on the spherical mirror at each camera are evaluated and the point which has the maximum evaluation value is regarded as an estimation value of the light source position. With the proposed method, we can estimate the light source position and reflectance of the real objects at an interactive frame rates by use of the Kinect and the marker with the spherical mirror, so that we are able to apply the method to various mixed-reality applications.","PeriodicalId":231632,"journal":{"name":"2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122484775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accuracy Improvement of Melanosome Tracking by Error Correction","authors":"Toshiaki Okabe, K. Hotta","doi":"10.1109/DICTA.2013.6691477","DOIUrl":"https://doi.org/10.1109/DICTA.2013.6691477","url":null,"abstract":"This paper proposes an error correction method for improving accuracy of melanosome tracking. Melanosomes in intracellular images are tracked manually to investigate the cause of disease, and an automatic tracking method is desirable. We detect all melanosome candidates by SIFT with 2 different parameters. Of course, the SIFT also detects non- melanosomes. Therefore, we use the 4-valued difference image (4-VDimage) to eliminate non- melanosome candidates. After tracking melanosome, we track the melanosome with low confidence again from t+1 to t. If the results from t to t+1 and from t+1 to t are different, we judge that initial tracking result is a failure, the melanosome is eliminated from candidates and re-tracking is carried out. Experimental results demonstrate that our method can correct the error and improves the accuracy.","PeriodicalId":231632,"journal":{"name":"2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128069243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}