Fayez Tarsha Kurdi, F. T. Kurdi, Zahra Gharineiat, Glenn Campbell, E. Dey, M. Awrangjeb
{"title":"Full Series Algorithm of Automatic Building Extraction and Modelling From LiDAR Data","authors":"Fayez Tarsha Kurdi, F. T. Kurdi, Zahra Gharineiat, Glenn Campbell, E. Dey, M. Awrangjeb","doi":"10.1109/DICTA52665.2021.9647313","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647313","url":null,"abstract":"This paper suggests an algorithm that automatically links the automatic building classification and modelling algorithms. To make this connection, the suggested algorithm applies two filters to the building classification results that enable processing of the failed cases of the classification algorithm. In this context, it filters the noisy terrain class and analyses the remaining points to detect missing buildings. Moreover, it filters the detected building to eliminate all undesirable points such as those associated with trees overhanging the building roof, the surrounding terrain and the façade points. In the modelling algorithm, the error map matrix is analysed to recognize the failed cases of the building modelling algorithm with these buildings being modelled with flat roofs. Finally, the region growing algorithm is applied on the building mask to detect each building and pass it to the modelling algorithm. The accuracy analysis of the classification and modelling algorithm within the global algorithm shows it to be highly effective. Hence, the total error of the building classification algorithm is 0.01% and only one building in the sample dataset is rejected by the modelling algorithm and even that is modelled, but with a flat roof. Most of the buildings have Segmentation Accuracy and Quality factor less than 5% (error less than 5%) which means that the resulting evaluation is excellent.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"77 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127185692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarah Alhammad, Kun Zhao, A. Jennings, Peter Hobson, Daniel F. Smith, Brett Baker, Justin Staweno, B. Lovell
{"title":"Efficient DNN-Based Classification of Whole Slide Gram Stain Images for Microbiology","authors":"Sarah Alhammad, Kun Zhao, A. Jennings, Peter Hobson, Daniel F. Smith, Brett Baker, Justin Staweno, B. Lovell","doi":"10.1109/DICTA52665.2021.9647415","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647415","url":null,"abstract":"The interpretation of conventional glass Gram stain microscopy slides is both subjective and time consuming. The first step towards Digital Pathology is to convert Gram slides into Whole Slide Images (WSIs) - this image capture process itself is extremely challenging due to the need for x 100 objectives with oil immersion for conventional microscopy. With high volume pathology laboratories, having an Artificial Intelligence (AI) system based on deep neural networks (DNNs) operating on WSIs could be extremely beneficial to alleviate problems faced by conventional pathology at scale. Such a system would ensure accuracy, reduce the workload of pathologists, and enhance both objectivity and efficiency. After reviewing the pathology literature, it is exceedingly rare to find methods or datasets relating to the very important Gram stain test compared to other pathology tests such as Breast cancer, Lymphoma and Colorectal cancer. This data scarcity has likely hindered research on Gram stain automation. This paper aims to use deep learning to classify Gram positive cocci bacteria subtypes, and to study the effect of downsampling, data augmentation, and image size on both classification accuracy and speed. Experiments were conducted on a novel dataset of three bacteria subtypes provided by Sullivan Nicolaides Pathology (SNP) comprising: Staphylococcus, Enterococcus and Streptococcus. The subimages are obtained from blood culture WSIs captured by the in-house SNP MicroLab using a x 63 objective without coverslips or oil immersion. Our results show that a DNN-based classifier distinguishes between these bacteria subtypes with high classification accuracy.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123444354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sachith Seneviratne, K. Nice, J. Wijnands, Mark Stevenson, Jason Thompson
{"title":"Self-Supervision. Remote Sensing and Abstraction: Representation Learning Across 3 Million Locations","authors":"Sachith Seneviratne, K. Nice, J. Wijnands, Mark Stevenson, Jason Thompson","doi":"10.1109/DICTA52665.2021.9647061","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647061","url":null,"abstract":"Self-supervision based deep learning classification approaches have received considerable attention in academic literature. However, the performance of such methods on remote sensing imagery domains remains under-explored. In this work, we explore contrastive representation learning methods on the task of imagery-based city classification, an important problem in urban computing. We use satellite and map imagery across 2 domains, 3 million locations and more than 1500 cities. We show that self-supervised methods can build a generalizable representation from as few as 200 cities, with representations achieving over 95% accuracy in unseen cities with minimal additional training. We also find that the performance discrepancy of such methods, when compared to supervised methods, induced by the domain discrepancy between natural imagery and abstract imagery is significant for remote sensing imagery. We compare all analysis against existing supervised models from academic literature and open-source our models11https://github.com/sachith500/self-supervision-remote-sensing-abstraction for broader usage and further criticism.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"81 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123274559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semi-Supervised 3D Hand Shape and Pose Estimation with Label Propagation","authors":"Samira Kaviani, Amir M. Rahimi, R. Hartley","doi":"10.1109/DICTA52665.2021.9647255","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647255","url":null,"abstract":"To obtain 3D annotations, we are restricted to controlled environments or synthetic datasets, leading us to 3D datasets with less generalizability to real-world scenarios. To tackle this issue in the context of semi-supervised 3D hand shape and pose estimation, we propose the Pose Alignment network to propagate 3D annotations from labelled frames to nearby unlabelled frames in sparsely annotated videos. We show that incorporating the alignment supervision on pairs of labelled-unlabelled frames allows us to improve the pose estimation accuracy. Besides, we show that the proposed Pose Alignment network can effectively propagate annotations on unseen sparsely labelled videos without fine-tuning.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"217 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113983451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lin Zhang, Bowen Xin, Shaozhen Yan, Chaoiie Zheng, Yun Zhou, Jie Lu, Xiuying Wang
{"title":"Multi-stratification feature selection for diagnostic analysis of Alzheimer's disease","authors":"Lin Zhang, Bowen Xin, Shaozhen Yan, Chaoiie Zheng, Yun Zhou, Jie Lu, Xiuying Wang","doi":"10.1109/DICTA52665.2021.9647043","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647043","url":null,"abstract":"In current neuroimaging analysis, feature selection majorly focuses on analysis within single brain regions. However, the fact that brain activities are usually associated with multiple brain regions highlights the importance of the multi-brain-region interaction, which is underexplored. To address this challenge, we propose a multi-stratification feature selection framework for analysing multiple brain regions in Magnetic Resonance Imaging (MRI). This framework consists of two major modules: intra-Region of Interest (ROI) module and inter-ROI module. Intra-ROI module selects representative features for each brain region by analysing both of the statistical difference of features and the classifier performance of the candidate subset. Inter-ROI module employs the evaluation function to guide the search, sequentially adding features from brain regions based on the corresponding predictive capacity. Only relevant and maximum joint significance features that improve the evaluation performance would be selected in this module. The proposed framework was validated on the diagnostic task of Alzheimer's disease. T1-MR images were collected from 196 Alzheimer's disease patients and 259 normal control subjects. The experiments demonstrated that the proposed multi-stratification feature selection outperformed the state-of-the-art single-brain-region analysis and the radiomics early integration methods applied to multiple-brain-region, achieving AUC 0.913.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134228172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi-Li Huang, Shilin Wang, Chengyu Gu, Zheng Huang, Kai Chen
{"title":"A Seq2seq-based Model with Global Semantic Context for Scene Text Recognition","authors":"Yi-Li Huang, Shilin Wang, Chengyu Gu, Zheng Huang, Kai Chen","doi":"10.1109/DICTA52665.2021.9647413","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647413","url":null,"abstract":"Scene text recognition (STR), with various applications, has become a popular research. With deep learning, many sequence to sequence (seq2seq) models have been proposed. However, the Teacher-Forcing training method used in the seq2seq models gave rise to the problem of exposure bias. Moreover, the autoregressive decoding manner limits seq2seq models ability of utilizing future semantic information. To solve these problems, a new Transformer-based network is proposed in this paper. A Re-Embedding Layer with sampling module is introduced to overcome the problem of exposure bias and a context fusion module (CFM) is designed to model global context information. Experiment results on several benchmarks have demonstrated the effectiveness of the proposed method in scene text recognition.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122476804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rongcheng Wu, Changming Sun, Zhaoying Liu, A. Sowmya
{"title":"Deep learning based stereo cost aggregation on a small dataset","authors":"Rongcheng Wu, Changming Sun, Zhaoying Liu, A. Sowmya","doi":"10.1109/DICTA52665.2021.9647104","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647104","url":null,"abstract":"Deep learning (DL) has been used in many computer vision tasks including stereo matching. However, DL is data hungry, and a large number of highly accurate real-world training images for stereo matching is too expensive to acquire in practice. The majority of studies rely on large simulated datasets during training, which inevitably results in domain shift problems that are commonly compensated by fine-tuning. This work proposes a recursive 3D convolutional neural network (CNN) to improve the accuracy of DL based stereo matching that is suitable for real-world scenarios with a small set of available images, without having to use a large simulated dataset and without fine-tuning. In addition, we propose a novel scale-invariant feature transform (SIFT) based adaptive window for matching cost computation that is a crucial step in the stereo matching pipeline to enhance accuracy. Extensive end-to-end comparative experiments demonstrate the superiority of the proposed recursive 3 $D$ CNN and SIFT based adaptive windows. Our work achieves effective generalization corroborated by training solely on the indoor Middlebury Stereo 2014 dataset and validating on outdoor KITTI 2012 and KITTI 2015 datasets. As a comparison, our bad-4.0-error is 24.2 that is on par with the AANet (CVPR2020) method according to the publicly evaluated report from the Middlebury Stereo Evaluation Benchmark.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129008993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chrisbin James, Yanyang Gu, S. Chapman, Wei Guo, Etienne David, S. Madec, A. Potgieter, Anders Eriksson
{"title":"Domain Adaptation for Plant Organ Detection with Style Transfer","authors":"Chrisbin James, Yanyang Gu, S. Chapman, Wei Guo, Etienne David, S. Madec, A. Potgieter, Anders Eriksson","doi":"10.1109/DICTA52665.2021.9647293","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647293","url":null,"abstract":"Deep learning based detection of sorghum panicles has been proposed to replace manual counting in field trials. However, model performance is highly sensitive to domain shift between training datasets associated with differences in genotypes, field conditions, and various lighting conditions. As labelling such datasets is expensive and laborious, we propose a pipeline of Contrastive Unpaired Translation (CUT) based domain adaptation method to improve detection performance in new datasets, including for completely different crop species. Firstly, original dataset is translated to other styles using CUT trained on unlabelled datasets from other domains. Then labels are corrected after synthesis of the new domain dataset. Finally, detectors are retrained on the synthesized dataset. Experiments show that, in case of sorghum panicles, the accuracy of the models when trained with synthetic images improve by fifteen to twenty percent. Furthermore, the models are more robust towards change in prediction thresholds. Hence, demonstrating the effectiveness of the pipeline.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129162618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Morgan Liang, Xun Li, Sandersan Onie, M. Larsen, A. Sowmya
{"title":"Improved Spatio-temporal Action Localization for Surveillance Videos","authors":"Morgan Liang, Xun Li, Sandersan Onie, M. Larsen, A. Sowmya","doi":"10.1109/DICTA52665.2021.9647106","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647106","url":null,"abstract":"We present an improved spatiotemporal action localization framework that operates in an online manner. Current state of the art approaches have achieved remarkable results mainly due to the advancements in action recognition models. These approaches have commonly followed a two-stage pipeline consisting of a region proposal stage and an action classification stage. Recently, the improvement in spatiotemporal action localization models have focused on improving the action classification stage. As a result, the outputs generated in the region proposal stage are suboptimal. We believe that the proposal stage remains a crucial component in determining the overall model performance. As a result, we adopt a tracking model in place of the existing proposal models to generate more accurate and robust regions of interest (RoI). We evaluate our approach on a private CCTV surveillance dataset and on the challenging JHMDB-21 benchmark. We are able to achieve promising results on our private dataset and achieve good results for the JHMDB-21 benchmark.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115757677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rafael Felix, Boris Repasky, Samuel Hodge, Reza Zolfaghari, Ehsan Abbasnejad, J. Sherrah
{"title":"Cross-Modal Visual Question Answering for Remote Sensing Data: The International Conference on Digital Image Computing: Techniques and Applications (DICTA 2021)","authors":"Rafael Felix, Boris Repasky, Samuel Hodge, Reza Zolfaghari, Ehsan Abbasnejad, J. Sherrah","doi":"10.1109/DICTA52665.2021.9647287","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647287","url":null,"abstract":"While querying of structured geo-spatial data such as Google Maps has become commonplace, there remains a wealth of unstructured information in overhead imagery that is largely inaccessible to users. This information can be made accessible using machine learning for Visual Question Answering (VQA) about remote sensing imagery. We propose a novel method for Earth observation based on answering natural language questions about satellite images that uses cross-modal attention between image objects and text. The image is encoded with an object-centric feature space, with self-attention between objects, and the question is encoded with a language transformer network. The image and question representations are fed to a cross-modal transformer network that uses cross-attention between the image and text modalities to generate the answer. Our method is applied to the RSVQA remote sensing dataset and achieves a significant accuracy increase over the previous benchmark.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133956430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}