{"title":"Deep Learning Methods for Human Behavior Recognition","authors":"Jia Lu, M. Nguyen, W. Yan","doi":"10.1109/IVCNZ51579.2020.9290640","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290640","url":null,"abstract":"In this paper, we investigate the problem of human behavior recognition by using the state-of-the-art deep learning methods. In order to achieve sufficient recognition accuracy, both spatial and temporal information was acquired to implement the recognition in this project. We propose a novel YOLOv4 + LSTM network, which yields promising results for real-time recognition. For the purpose of comparisons, we implement Selective Kernel Network (SKNet) with attention mechanism. The key contributions of this paper are: (1) YOLOv4 + LSTM network is implemented to achieve 97.87% accuracy based on our own dataset by using spatiotemporal information from pre-recorded video footages. (2) The SKNet with attention model that earns the best accuracy of human behaviour recognition at the rate up to 98.7% based on multiple public datasets.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133638142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating Learned State Representations for Atari","authors":"Adam Tupper, K. Neshatian","doi":"10.1109/IVCNZ51579.2020.9290609","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290609","url":null,"abstract":"Deep reinforcement learning, the combination of deep learning and reinforcement learning, has enabled the training of agents that can solve complex tasks from visual inputs. However, these methods often require prohibitive amounts of computation to obtain successful results. To improve learning efficiency, there has been a renewed focus on separating state representation and policy learning. In this paper, we investigate the quality of state representations learned by different types of autoencoders, a popular class of neural networks used for representation learning. We assess not only the quality of the representations learned by undercomplete, variational, and disentangled variational autoencoders, but also how the quality of the learned representations is affected by changes in representation size. To accomplish this, we also present a new method for evaluating learned state representations for Atari games using the Atari Annotated RAM Interface. Our findings highlight differences in the quality of state representations learned by different types of autoencoders and their robustness to reduction in representation size. Our results also demonstrate the advantage of using more sophisticated evaluation methods over assessing reconstruction quality.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129522291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vehicle-Related Scene Segmentation Using CapsNets","authors":"Xiaoxu Liu, W. Yan, N. Kasabov","doi":"10.1109/IVCNZ51579.2020.9290664","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290664","url":null,"abstract":"Understanding of traffic scenes is a significant research problem in computer vision. In this paper, we present and implement a robust scene segmentation model by using capsule network (CapsNet) as a basic framework. We collected a large number of image samples related to Auckland traffic scenes of the motorway and labelled the data for multiple classifications. The contribution of this paper is that our model facilitates a better scene understanding based on matrix representation of pose and spatial relationship. We take a step forward to effectively solve the Picasso problem. The methods are based on deep learning and reduce human manipulation of data by completing the training process using only a small size of training data. Our model has the preliminary accuracy up to 74.61% based on our own dataset.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130402793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Defects Detection in Highly Specular Surface using a Combination of Stereo and Laser Reconstruction","authors":"Arpita Dawda, M. Nguyen","doi":"10.1109/IVCNZ51579.2020.9290660","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290660","url":null,"abstract":"Product inspection is an indispensable tool of the current manufacturing process. It helps maintain the quality of the product and reduces manufacturing costs by eliminating scrap losses [1]. In the modern era, the inspection process also needs to be automatic, fast and accurate [2]. “Machine vision is the technology and methods used to provide imaging-based automatic inspection and analysis [3].” However, highly specular (mirror-like) surfaces are still proven to be the limitation of many state-of-art three-dimensional (3D) reconstruction approaches. The specularity of the outer surface makes it difficult to 3D reconstruct the product model accurately. Along with accurate measurements, it is also essential to detect defects such as dents, bumps, cracks and scratches present in a product. As these defects are palpable and are not visible by the camera, it is tough to detect them using vision-based inspection techniques in ambient lighting conditions. This paper presents an automated defect detection technique using the concepts of laser line projection and stereo vision. This research activity came up as an evolution of a previous study in which, the ideas of stereo-vision reconstruction and laser line projection were used, for accurate 3D measurement of highly specular surfaces. In this paper, the detection of three defect types (Dents, Scratches and Bumps) are examined in ambient lighting conditions. In the end, the output 3D profile of the defected product is compared with the non-defective product for accuracy evaluation.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128370790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Class Probability-based Visual and Contextual Feature Integration for Image Parsing","authors":"Basim Azam, Ranju Mandal, Ligang Zhang, B. Verma","doi":"10.1109/IVCNZ51579.2020.9290686","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290686","url":null,"abstract":"Deep learning networks have become one of the most promising architectures for image parsing tasks. Although existing deep networks consider global and local contextual information of the images to learn coarse features individually, they lack automatic adaptation to the contextual properties of scenes. In this work, we present a visual and contextual feature-based deep network for image parsing. The main novelty is in the 3-layer architecture which considers contextual information and each layer is independently trained and integrated. The network explores the contextual features along with the visual features for class label prediction with class-specific classifiers. The contextual features consider the prior information learned by calculating the co-occurrence of object labels both within a whole scene and between neighboring superpixels. The class-specific classifier deals with an imbalance of data for various object categories and learns the coarse features for every category individually. A series of weak classifiers in combination with boosting algorithms are investigated as classifiers along with the aggregated contextual features. The experiments were conducted on the benchmark Stanford background dataset which showed that the proposed architecture produced the highest average accuracy and comparable global accuracy.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129632402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Loke, B. MacDonald, Matthew Parsons, B. Wünsche
{"title":"Fast Portrait Segmentation of the Head and Upper Body","authors":"S. Loke, B. MacDonald, Matthew Parsons, B. Wünsche","doi":"10.1109/IVCNZ51579.2020.9290654","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290654","url":null,"abstract":"Portrait segmentation is the process whereby the head and upper body of a person is separated from the background of an image or video stream. This is difficult to achieve accurately, although good results have been obtained with deep learning methods which cope well with occlusion, pose and illumination changes. These are however, either slow or require a powerful system to operate in real-time. We present a new method of portrait segmentation called FaceSeg which uses fast DBSCAN clustering combined with smart face tracking that can replicate the benefits and accuracy of deep learning methods at a much faster speed. In a direct comparison using a standard testing suite, our method achieved a segmentation speed of 150 fps for a 640x480 video stream with median accuracy and F1 scores of 99.96% and 99.93% respectively on simple backgrounds, with 98.81% and 98.13% on complex backgrounds. The state-of-art deep learning based FastPortrait / Mobile Neural Network method achieved 15 fps with 99.95% accuracy and 99.91% F1 score on simple backgrounds, and 99.01% accuracy and 98.43 F1 score on complex backgrounds. An efficacy-boosted implementation for FaceSeg can achieve 75 fps with 99.23% accuracy and 98.79% F1 score on complex backgrounds.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"478 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134140386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine Learning with Synthetic Data – a New Way to Learn and Classify the Pictorial Augmented Reality Markers in Real-Time","authors":"H. Le, M. Nguyen, W. Yan","doi":"10.1109/IVCNZ51579.2020.9290606","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290606","url":null,"abstract":"The idea of Augmented Reality (AR) appeared in the early 60s, which recently received a large amount of public attention. AR allows us to work, learn, play, and connect with the world around us both virtually and physically in real-time. However, picking the AR marker to match the users’ needs is one of the most challenging tasks due to different marker encryption/decryption methods and essential requirements. Barcode AR cards are fast and efficient, but they do not contain much visual information; pictorial coloured AR card, on the other hand, is slow and not reliable. This paper proposes a solution to obtain detectable arbitrary pictorial/colour AR cards in real-time by applying the benefit of machine learning and the power of synthetic data generation techniques. This technique solves the issue of labour-intensive tasks of manual annotations when building a massive training dataset of deep-learning. Thus, with a small number of input of the AR-enhanced target figures (as few as one for each coloured card), the synthetic data generated process will produce a deep-learning trainable dataset using computer-graphic rendering techniques (ten of thousands from just one image). Second, the generated dataset is then trained with a chosen object recognition convolutional neural network, acting as the AR marker tracking functionality. Our proposed idea works effectively well without modifying the original contents (of the chosen AR card). The benefits of using synthetic data generated techniques help us to improve the AR marker recognition accuracy and reduce the marker registration time. The trained model is capable of processing video sequences at approximately 25 frames per second without GPU Acceleration, which is suitable for AR experience on the mobile/web platform. We believed that it could be a promising low-cost AR approach in many areas, such as education and gaming.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116886257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatically localising ROIs in hyperspectral images using background subtraction techniques","authors":"Munir Shah, V. Cave, Marlon dos Reis","doi":"10.1109/IVCNZ51579.2020.9290728","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290728","url":null,"abstract":"The use of snapshot hyperspectral cameras is becoming increasingly popular in agricultural scientific studies. One of the key steps in processing experimental hyperspectral data is to precisely locate the sample material under study and separate it from other background material, such as sampling instruments or equipment. This is very laborious work, especially for hyperspectral imaging scenarios where there might be a few hundred spectral images per sample. In this paper we propose a multiple-background modelling approach for automatically localising the Regions of Interest (ROIs) in hyperspectral images. The two key components of this method are i) modelling each spectral band individually and ii) applying a consensus algorithm to obtain the final ROIs for the whole hyperspectral image. Our proposed approach is able to achieve approximately a 14% improvement in ROIs detection in hyperspectral images compared to traditional video background modelling techniques.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"117 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129174574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiayu Liu, Vishnu Anand Muruganandan, R. Clare, María Celeste Ramírez Trujillo, S. Weddell
{"title":"A Tip-Tilt Mirror Control System for Partial Image Correction at UC Mount John Observatory","authors":"Jiayu Liu, Vishnu Anand Muruganandan, R. Clare, María Celeste Ramírez Trujillo, S. Weddell","doi":"10.1109/IVCNZ51579.2020.9290543","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290543","url":null,"abstract":"Astronomical images captured by ground-based telescopes, including at University of Canterbury Mount John Observatory, are distorted due to atmospheric turbulence. The major constituents of atmospheric distortion are tip-tilt aberrations. The solution to achieve higher resolution is to develop and install a tip-tilt mirror control system on ground-based telescopes. A real-time tip-tilt mirror control system measures and corrects for tip-tilt aberrations in optical wavefronts. It effectively minimises the perturbation of the star image when observing with the aid of a telescope. To the best of our knowledge, this is the first tip-tilt mirror control system to be applied at a New Zealand astronomical observatory. This would extend the possibilities of correcting higher-order aberrations for 0.5 to 1.0 metre class, ground-based telescopes.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122374628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Plant Trait Segmentation for Plant Growth Monitoring","authors":"Abhipray Paturkar, G. S. Gupta, D. Bailey","doi":"10.1109/IVCNZ51579.2020.9290575","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290575","url":null,"abstract":"3D point cloud segmentation is an important step for plant phenotyping applications. The segmentation should be able to separate the various plant components such as leaves and stem robustly to enable traits to be measured. Also, it is important for the segmentation method to work on range of plant architectures with good accuracy and computation time. In this paper, we propose a segmentation method using Euclidean distance to segment the point cloud generated using a structure-from-motion algorithm. The proposed algorithm requires no prior information about the point cloud. Experimental results illustrate that our proposed method can effectively segment the plant point cloud irrespective of its architecture and growth stage. The proposed method has outperformed the standard methods in terms of computation time and segmentation quality.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132144853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}