{"title":"Real Time Ray Tracing of Analytic and Implicit Surfaces","authors":"Finn Petrie, S. Mills","doi":"10.1109/IVCNZ51579.2020.9290653","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290653","url":null,"abstract":"Real-time ray-tracing debuted to consumer GPU hardware in 2018. Primary examples however, have been of hybrid raster and ray-tracing methods that are restricted to triangle mesh geometry. Our research looks at the viability of procedural methods in the real-time setting. We give implementations of analytical and implicit geometry in the domain of the global illumination algorithms bi-directional path-tracing, and GPU Photon-Mapping – both of which we have adapted to the new ray-tracing shader stages, as shown in Figure 1. Despite procedural intersections being more expensive than triangle intersections in Nvidia’s RTX hardware, our results show that these descriptions still run at interactive rates within computationally expensive multi-pass ray-traced global illumination and demonstrate the practical benefits of the geometry.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123748453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introducing Transfer Leaming to 3D ResNet-18 for Alzheimer’s Disease Detection on MRI Images","authors":"Amir Ebrahimi, S. Luo, R. Chiong","doi":"10.1109/IVCNZ51579.2020.9290616","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290616","url":null,"abstract":"This paper focuses on detecting Alzheimer’s Disease (AD) using the ResNet-18 model on Magnetic Resonance Imaging (MRI). Previous studies have applied different 2D Convolutional Neural Networks (CNNs) to detect AD. The main idea being to split 3D MRI scans into 2D image slices, so that classification can be performed on the image slices independently. This idea allows researchers to benefit from the concept of transfer learning. However, 2D CNNs are incapable of understanding the relationship among 2D image slices in a 3D MRI scan. One solution is to employ 3D CNNs instead of 2D ones. In this paper, we propose a method to utilise transfer learning in 3D CNNs, which allows the transfer of knowledge from 2D image datasets to a 3D image dataset. Both 2D and 3D CNNs are compared in this study, and our results show that introducing transfer learning to a 3D CNN improves the accuracy of an AD detection system. After using an optimisation method in the training process, our approach achieved 96.88% accuracy, 100% sensitivity, and 93.75% specificity.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"558 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116275792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Graph-Based Approach to Automatic Convolutional Neural Network Construction for Image Classification","authors":"Gonglin Yuan, Bing Xue, Mengjie Zhang","doi":"10.1109/IVCNZ51579.2020.9290492","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290492","url":null,"abstract":"Convolutional neural networks (CNNs) have achieved great success in the image classification field in recent years. Usually, human experts are needed to design the architectures of CNNs for different tasks. Evolutionary neural network architecture search could find optimal CNN architectures automatically. However, the previous representations of CNN architectures with evolutionary algorithms have many restrictions. In this paper, we propose a new flexible representation based on the directed acyclic graph to encode CNN architectures, to develop a genetic algorithm (GA) based evolutionary neural network architecture, where the depth of candidate CNNs could be variable. Furthermore, we design new crossover and mutation operators, which can be performed on individuals of different lengths. The proposed algorithm is evaluated on five widely used datasets. The experimental results show that the proposed algorithm achieves very competitive performance against its peer competitors in terms of the classification accuracy and number of parameters.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126361372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Shadow-based Light Detection for HDR Environment Maps","authors":"Andrew Chalmers, Taehyun Rhee","doi":"10.1109/IVCNZ51579.2020.9290734","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290734","url":null,"abstract":"High dynamic range (HDR) environment maps (EMs) are spherical textures containing HDR pixels used for illuminating virtual scenes with high realism. Detecting as few necessary pixels as possible within the EM is important for a variety of tasks, such as real-time rendering and EM database management. To address this, we propose a shadow-based algorithm for detecting the most dominant light sources within an EM. This algorithm takes into account the relative impact of all other light sources within the upper-hemisphere of the texture. This is achieved by decomposing an EM into superpixels, sorting the superpixels from brightest to least, and using ℓ0-norm minimisation to keep only the necessary superpixels that maintains the shadow quality of the EM with respect to the just noticeable difference (JND) principle. We show that our method improves upon prior methods in detecting as few lights as possible while still preserving the shadow-casting properties of EMs.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127445477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Taghinia, Vishnu Anand Muruganandan, R. Clare, S. Weddell
{"title":"A Wavefront Sensorless Tip/Tilt Removal method for Correcting Astronomical Images","authors":"P. Taghinia, Vishnu Anand Muruganandan, R. Clare, S. Weddell","doi":"10.1109/IVCNZ51579.2020.9290688","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290688","url":null,"abstract":"Images of astronomical objects captured by ground-based telescopes are distorted due to atmospheric turbulence. The phase of the atmospheric aberration is traditionally estimated by a wavefront sensor (WFS). This information is utilised by a deformable mirror through a control system to restore the image. However, in this paper, we utilise wavefront sensorless (WFSL) methods in which the wavefront sensor is absent. Given that the largest share of atmospheric turbulence energy is contained in the 2-axial tilt for small aperture telescopes, we use WFSL to specifically remove these two modes. This method is shown to be efficient in terms of both speed and accuracy.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131036719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AI in Photography: Scrutinizing Implementation of Super-Resolution Techniques in Photo-Editors","authors":"Noor-ul-ain Fatima","doi":"10.1109/IVCNZ51579.2020.9290737","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290737","url":null,"abstract":"Judging the quality of a photograph from the perspective of a photographer we can ascertain resolution, symmetry, content, location, etc. as some of the factors that influence the proficiency of a photograph. The exponential growth in the allurement for photography impels us to discover ways to perfect an input image in terms of the aforesaid parameters. Where content and location are the immutable ones, attributes like symmetry and resolution can be worked upon. In this paper, I prioritized resolution as our cynosure and there can be multiple ways to refine it. Image super-resolution is progressively becoming a prerequisite in the fraternity of computer graphics, computer vision, and image processing. It’s the process of obtaining high-resolution images from their low-resolution counterparts. In my work, image super-resolution techniques like Interpolation, SRCNN (Super-Resolution Convolutional Neural Network), SRResNet (Super Resolution Residual Network), and GANs (Generative Adversarial Networks: Super-Resolution GAN-SRGAN and Conditional GAN-CGAN) were studied experimentally for post-enhancement of images in photography as employed by photo-editors, establishing the most coherent approach for attaining optimized super-resolution in terms of quality.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123908597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Variational Autoencoder for 3D Voxel Compression","authors":"Juncheng Liu, S. Mills, B. McCane","doi":"10.1109/IVCNZ51579.2020.9290656","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290656","url":null,"abstract":"3D scene sensing and understanding is a fundamental task in the field of computer vision and robotics. One widely used representation for 3D data is a voxel grid. However, explicit representation of 3D voxels always requires large storage space, which is not suitable for light-weight applications and scenarios such as robotic navigation and exploration. In this paper we propose a method to compress 3D voxel grids using an octree representation and Variational Autoencoders (VAEs). We first capture a 3D voxel grid –in our application with collaborating Realsense D435 and T265 cameras. The voxel grid is decomposed into three types of octants which are then compressed by the encoder and reproduced by feeding the latent code into the decoder. We demonstrate the efficiency of our method by two applications: scene reconstruction and path planing.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121368114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparison of Face Detection Algorithms on Mobile Devices","authors":"Yishi Guo, B. Wünsche","doi":"10.1109/IVCNZ51579.2020.9290542","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290542","url":null,"abstract":"Face detection is a fundamental task for many computer vision applications such as access control, security, advertisement, automatic payment, and healthcare. Due to technological advances mobile robots are becoming increasingly common in such applications (e.g. healthcare and security robots) and consequently there is a need for efficient and effective face detection methods on such platforms. Mobile robots have different hardware configurations and operating conditions from desktop applications, e.g. unreliable network connections and the need for lower power consumption. Hence results for face detection methods on desktop platforms cannot be directly translated to mobile platforms.We compare four common face detection algorithms, Viola-Jones, HOG, MTCNN and MobileNet-SSD, for use in mobile robotics using different face data bases. Our results show that for a typical mobile configuration (Nvidia Jetson TX2) Mobile-NetSSD performed best with 90% detection accuracy for the AFW data set and a frame rate of almost 10 fps with GPU acceleration. MTCNN had the highest precision and was superior for more difficult face data sets, but did not achieve real-time performance with the given implementation and hardware configuration.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"425 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132234027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yerren van Sint Annaland, Lech Szymanski, S. Mills
{"title":"Predicting Cherry Quality Using Siamese Networks","authors":"Yerren van Sint Annaland, Lech Szymanski, S. Mills","doi":"10.1109/IVCNZ51579.2020.9290674","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290674","url":null,"abstract":"The cherry industry is a rapidly growing sector of New Zealand’s export merchandise and, as such, the accuracy with which pack-houses can grade cherries during processing is becoming increasingly critical. Conventional computer vision systems are usually employed in this process, yet they fall short in many respects, still requiring humans to manually verify the grading. In this work, we investigate the use of deep learning to improve upon the traditional approach. The nature of the industry means that the grade standards are influenced by a range of factors and can change on a daily basis. This makes conventional classification approaches infeasible (as there are no fixed classes) so we construct a model to overcome this. We convert the problem from classification to regression, using a Siamese network trained with pairwise comparison labels. We extract the model embedded within to predict continuous quality values for the fruit. Our model is able to predict which of two similar quality fruit is better with over 88% accuracy, only 5% below the self-agreement of a human expert.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114391185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Human Action Recognition Using Deep Learning Methods","authors":"Zeqi Yu, W. Yan","doi":"10.1109/IVCNZ51579.2020.9290594","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290594","url":null,"abstract":"The goal of human action recognition is to identify and understand the actions of people in videos and export corresponding tags. In addition to spatial correlation existing in 2D images, actions in a video also own the attributes in temporal domain. Due to the complexity of human actions, e.g., the changes of perspectives, background noises, and others will affect the recognition. In order to solve these thorny problems, three algorithms are designed and implemented in this paper. Based on convolutional neural networks (CNN), Two-Stream CNN, CNN+LSTM, and 3D CNN are harnessed to identify human actions in videos. Each algorithm is explicated and analyzed on details. HMDB-51 dataset is applied to test these algorithms and gain the best results. Experimental results showcase that the three methods have effectively identified human actions given a video, the best algorithm thus is selected.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"154 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134215150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}