{"title":"Visual Object Tracking in Spherical 360° Videos: A Bridging Approach","authors":"Simon Finnie, Fang-Lue Zhang, Taehyun Rhee","doi":"10.1109/IVCNZ51579.2020.9290549","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290549","url":null,"abstract":"We present a novel approach for adapting existing visual object trackers (VOT) to work for equirectangular video, utilizing image reprojection. Our system can easily be integrated with existing VOT algorithms, significantly increasing the accuracy and robustness of tracking in spherical 360° environments without requiring retraining. Our adapted approach involves the orthographic projection of a subsection of the image centered around the tracked object each frame. Our projection reduces the distortion around the tracked object each frame, allowing the VOT algorithm to more easily track the object as it moves.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128492647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Identification of Diatom Morphology using Deep Learning","authors":"Dana Lambert, R. Green","doi":"10.1109/IVCNZ51579.2020.9290564","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290564","url":null,"abstract":"This paper proposes a method to automatically identify diatom frustules using nine morphological categories. A total of 7092 images from NIWA and ADIAC with related taxa data were used to create training and test sets. Different augmentations and image processing methods were used on the training set to see if this would increase accuracy. Several CNNs were trained over a total of 50 epochs and the highest accuracy model was saved based on the validation set. Resnet-50 produced the highest accuracy of 94%, which is not as accurate as a similar study that achieved 99%, although this was for a slightly different classification problem.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"29 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125697168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A machine learning approach for image retrieval tasks","authors":"Achref Ouni","doi":"10.1109/IVCNZ51579.2020.9290617","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290617","url":null,"abstract":"Several methods based on visual methods (BoVW, VLAD,…) or recent deep leaning methods try to solve the CBIR problem. Bag of visual words (BoVW) is one of most module used for both classification and image recognition. But, even with the high performance of BoVW, the problem of retrieving the image by content is still a challenge in computer vision. In this paper, we propose an improvement on a bag of visual words by increasing the accuracy of the retrieved candidates. In addition, we reduce the signature construction time by exploiting the powerful of the approximate nearest neighbor algorithms (ANNs). Experimental results will be applied to widely data sets (UKB, Wang, Corel 10K) and with different descriptors (CMI, SURF).","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127598139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Development of a Virtual Environment Based Image Generation Tool for Neural Network Training","authors":"R. Arenas, P. Delmas, Alfonso Gastelum-Strozzi","doi":"10.1109/IVCNZ51579.2020.9290491","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290491","url":null,"abstract":"We present a computational tool to generate visual and descriptive data used as additional training images for neural networks involved in image recognition tasks. The work is inspired by the problem posed to acquire enough data, in order to train service robots, with the goal of improving the range of objects in the environment with which they can interact. The tool provides a framework that allows users to easily setup different environments with the visual information needed for the training, accordingly to their needs. The tool was developed with the Unity engine, and it was designed to be able to import external prefabs. These models are standardized and catalogued into lists, which are accessed to create more complex and diverse virtual environments. Another component of the tool adds an additional layer of complexity by creating randomized environments with different conditions (scale, position and orientation of objects, and environmental illumination). The performance of the created dataset was tested by training the information on the YOLO-V3 (You Only Look Once) architecture and testing on both artificial and real images.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133913845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Gallo, Gianmarco Ria, Nicola Landro, Riccardo La Grassa
{"title":"Image and Text fusion for UPMC Food-101 using BERT and CNNs","authors":"I. Gallo, Gianmarco Ria, Nicola Landro, Riccardo La Grassa","doi":"10.1109/IVCNZ51579.2020.9290622","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290622","url":null,"abstract":"The modern digital world is becoming more and more multimodal. Looking on the internet, images are often associated with the text, so classification problems with these two modalities are very common. In this paper, we examine multimodal classification using textual information and visual representations of the same concept. We investigate two main basic methods to perform multimodal fusion and adapt them with stacking techniques to better handle this type of problem. Here, we use UPMC Food-101, which is a difficult and noisy multimodal dataset that well represents this category of multimodal problems. Our results show that the proposed early fusion technique combined with a stacking-based approach exceeds the state of the art on the dataset used.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114242894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Melanoma and Nevi Classification using Convolution Neural Networks","authors":"R. Grove, R. Green","doi":"10.1109/IVCNZ51579.2020.9290736","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290736","url":null,"abstract":"Early identification of melanoma skin cancer is vital for the improvement of patients’ prospects of five year disease free survival. The majority of malignant skin lesions present at a general practice level where a diagnosis is based on a clinical decision algorithm. As a false negative diagnosis is an unacceptable outcome, clinical caution tends to result in a low positive predictive value of as low at 8%. There has been a large burden of surgical excisions that retrospectively prove to have been unnecessary.This paper proposes a method to identify melanomas in dermoscopic images using a convolution neural network (CNN). The proposed method implements transfer learning based on the ResNet50 CNN, pretrained using the ImageNet dataset. Datasets from the ISIC Archive were implemented during training, validation and testing. Further tests were performed on a smaller dataset of images taken from the Dermnet NZ website and from recent clinical cases still awaiting histological results to indicate the trained network’s ability to generalise to real cases. The 86% test accuracy achieved with the proposed method was comparable to the results of prior studies but required significantly less pre-processing actions to classify a lesion and was not dependant on consistent image scaling or the presence of a scale on the image. This method also improved on past research by making use of all of the information present in an image as opposed to focusing on geometric and colour-space based aspects independently.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116126667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sowmya Kasturi, S. L. Moan, D. Bailey, Jeremy Smith
{"title":"Heating Patterns Recognition in Industrial Microwave-Processed Foods","authors":"Sowmya Kasturi, S. L. Moan, D. Bailey, Jeremy Smith","doi":"10.1109/IVCNZ51579.2020.9290639","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290639","url":null,"abstract":"Recognition or identification of hot and cold spot heating patterns in microwave-processed pre-packaged food products is crucial to determine experimental repeatability and design better and safer food treatment systems. This review focuses on computer vision-based methods for heating patterns recognition from the literature along with their limitations. A preliminary kinetics study to correlate colour to varied timetemperature combinations is also discussed.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123411317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Isotropic Remeshing by Dynamic Voronoi Tessellation on Voxelized Surface","authors":"Ashutosh Soni, Partha Bhowmick","doi":"10.1109/IVCNZ51579.2020.9290614","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290614","url":null,"abstract":"A novel algorithm for isotropic remeshing of a triangle mesh is presented in this paper. The algorithm is designed to work on a voxelized surface and integrates several novel ideas. One such is the notion of functional partitioning that aids in uniform distribution of seeds for initializing the process of dynamic Voronoi tessellation (DVT). The concept of DVT is also novel and found to be quite effective for iteratively transforming the input mesh into an isotropic mesh while keeping the tessellation aligned with the surface geometry. In each iteration, a Voronoi energy field is used to rearrange the seeds and to recreate the DVT. Over successive iterations, the DVT is found to keep on improving the mesh isotropy without compromising with the surface features. The Delaunay triangles corresponding to the final tessellation are further subdivided in high-curvature regions. The resultant mesh is finally projected back onto the original mesh in order to minimize the Hausdorff error. As our algorithm works in voxel space, it is readily implementable in GPU. Experimental results on various datasets demonstrate its efficiency and robustness.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121795086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Wavefront reconstruction with the cone sensor","authors":"R. Clare, B. Engler, S. Weddell","doi":"10.1109/IVCNZ51579.2020.9290735","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290735","url":null,"abstract":"Wavefronts of light from celestial objects are aberrated by Earth’s evolving atmosphere, causing images captured by ground-based telescopes to be distorted. The slope of the phase of the wavefront can be estimated by a pyramid wavefront sensor, which subdivides the complex field at the focal plane of the telescope, producing four images of the aperture. The cone wavefront sensor is the extension of the pyramid sensor to having an infinite number of sides, and produces an annulus of intensity rather than four images. We propose and compare the following methods for reconstructing the wavefront from the intensity measurements from the cone sensor: (1) use the entire aperture image, (2) use the pixels inside the intensity annulus only, (3) create a map of slopes by subtracting the slice of annulus 180 degrees opposite, (4) create x and y slopes by cutting out pseudo-apertures around the annulus, and (5) use the inverse Radon transform of the intensity annulus converted to polar co-ordinates. We find via numerical simulation with atmospheric phase screens that methods (1) and (2) provide the best wavefront estimate, methods (3) and (4) the smallest interaction matrices, while method (5) allows direct reconstruction without an interaction matrix.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126263357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tapabrata (Rohan) Chakraborty, B. McCane, S. Mills, U. Pal
{"title":"PProCRC: Probabilistic Collaboration of Image Patches for Fine-grained Classification","authors":"Tapabrata (Rohan) Chakraborty, B. McCane, S. Mills, U. Pal","doi":"10.1109/IVCNZ51579.2020.9290537","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290537","url":null,"abstract":"We present a conditional probabilistic framework for collaborative representation of image patches. It incorporates background compensation and outlier patch suppression into the main formulation itself, thus doing away with the need for pre-processing steps to handle the same. A closed form non-iterative solution of the cost function is derived. The proposed method (PProCRC) outperforms earlier CRC formulations: patch based (PCRC, GP-CRC) as well as the state-of-the-art probabilistic (ProCRC and EProCRC) on three fine-grained species recognition datasets (Oxford Flowers, Oxford-IIIT Pets and CUB Birds) using two CNN backbones (Vgg-19 and ResNet-50).","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131484430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}