{"title":"Identifying Bikers Without Helmets Using Deep Learning Models","authors":"Md. Iqbal Hossain, Raghib Barkat Muhib, Amitabha Chakrabarty","doi":"10.1109/DICTA52665.2021.9647170","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647170","url":null,"abstract":"Inspired by the recent progress in Computer Vision, we introduce a real-time smart surveillance system which uses Computer Vision and Deep Learning algorithms to identify bikers without helmets and retrieves registration numbers from detected license plates using Tesseract OCR along with necessary Computer Vision techniques and libraries. The video dataset was collected from the busiest roads of Dhaka, Bangladesh in 720p HD resolution at 30 fps. Deep Learning framework Tensorflow's SSD Mobilenet V2 and Faster R-CNN inception V2 models were used for object detection. We validated the use of our system on our dataset which gave 90%, 55%, 80%, 95% accuracy for helmet, human, bike and number plate respectively in SSD Mobilenet V2 and 92%, 58%, 81%, 96% for helmet, human, bike and number plate respectively in Faster RCNN inception V2. The number plate recognition has an accuracy of 98%. The retrieved registration numbers are then stored in a database for further identification of the bikers without helmets. The proposed system outperforms other related real-time helmet detection systems and license plate recognition models. The system achieved a high frames per second(FPS) rate of approximately 45 on NVIDIA RTX2080 GPU and was able to perform successfully even when there were 6 bikes in a frame. Another contribution is that, our dataset has a high biker density per frame and 5626 images were labeled with 24465 bounding boxes. The dataset can be used for further real-time surveillance system research effectively.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124605672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shide Zou, Fengchao Xiong, Haonan Luo, Jianfeng Lu, Y. Qian
{"title":"AF-Net: All-scale Feature Fusion Network for Road Extraction from Remote Sensing Images","authors":"Shide Zou, Fengchao Xiong, Haonan Luo, Jianfeng Lu, Y. Qian","doi":"10.1109/DICTA52665.2021.9647235","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647235","url":null,"abstract":"Road extraction from high-resolution remote sensing images (RSIs) is a challenging task due to occlusion, irregular structures, complex background, etc. A typical solution for road extraction is semantic segmentation that tries to segment the road region directly from the background region at the pixel level. Because of the narrow and slender structures of roads, high-quality multi-resolution and diverse semantic feature representations are necessary for this task. To this end, this paper introduces an all-scale feature fusion network named as AF-Net to extract roads from RSIs. AF-Net adopts an encoder-decoder architecture, whose encoder and decoder are connected by the introduced all-scale feature fusion module (AF-module). AF-module contains multiple feature fusion stages, corresponding to features of different scales. At each stage of feature fusion, all-scale all-level feature representations are employed to recursively integrate the features from two paths. One path propagates the high-resolution spatial features to the current scale feature and another path merges the current scale feature with high-level semantic features. In this way, we effectively employ all-scale features with varied spatial information and semantic information in each fusion stage, facilitating producing more accurate spatial information and richer semantic information for road extraction. Moreover, a convolutional block attention module is embedded into AF-module to suppress unconducive features from the surrounding background and improve the quality of extracted roads. Due to the features with richer semantic information and more precise spatial information, the proposed AF-Net outperforms other state-of-the-art methods on two benchmark datasets.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130386341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extraction of Forest Power lines From LiDAR point cloud Data","authors":"Nosheen Munir, M. Awrangjeb, Bela Stantic","doi":"10.1109/DICTA52665.2021.9647062","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647062","url":null,"abstract":"This paper presents a hierarchical method for high voltage power lines extraction and reconstruction. To begin, the potential power lines points are differentiated from the pylons and other plants using visual-based characteristics, i.e., power lines are non-vertical objects since they dangle above the ground and have space between them, while vegetation and pylons are vertical objects. The power line points are further refined from noise and surrounding vegetation points using the Hough transform. The pylons are detected from vertical points using their shape and area properties and used to obtain the power lines in the form of span points at their locations. For bundles extraction, the span points are divided into several segments and binary mask is produced from each segment. Each binary mask is utilised to link up the bundle segments using image-based techniques and to rebuild the broken/missing section of power lines. Finally, power lines are modelled in 3D polynomial curve models. The proposed method is tested on different spans from three different datasets and object-based evaluation of the proposed technique yields promising results.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132112385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Attention-based Long-term Modeling for Deep Visual Odometry","authors":"Sangni Xu, Hao Xiong, Qiuxia Wu, Zhiyong Wang","doi":"10.1109/DICTA52665.2021.9647140","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647140","url":null,"abstract":"Visual odometry (VO) aims to determine the positions of a moving camera from an image sequence it acquired. It has been extensively utilized in many applications such as AR/VR, autonomous driving, and robotics. Conventional VO methods largely rely on hand-crafted features and data association that are in fact unreliable and suffering from fast motions. Therefore, learning-based VO utilizes neural networks mapping an image sequence to corresponding camera poses directly. Most existing learning-based methods also integrate with additional Long Short-Term Memory (LSTM) networks to model the temporal context across images, since the camera pose estimation of an image in VO is highly relevant to other images in the same sequence. However, traditional LSTM is limited to model short-term dependency rather than long-term temporal context or global information. To mitigate this issue, we propose an attention based long-term modelling approach by devising a new fusion gate into the LSTM cell. Our method consists of two modules: convolutional motion encoder and recurrent global motion refinement module. Specifically, the convolutional motion encoder extracts from images motion features which are then fused by the refinement module with more long-term temporal information. In the refinement module, the devised fusion gate generates long-term temporal information in two phases: (1) extracting correlated long-term information from previous predictions through a devised attention module; and (2) updating the current hidden state with extracted long-term information. As a result, it enables our model to gather long-term temporal information and further enhance estimation accuracy. We comprehensively evaluate our proposed method on two public datasets, KITTI and Oxford RobotCar. The experimental results demonstrate the effectiveness and superiority of our method over the baseline model.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127468955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sutharsan Mahendren, Tharindu Fernando, S. Sridharan, Peyman Moghadam, C. Fookes
{"title":"Reduction of Feature Contamination for Hyper Spectral Image Classification","authors":"Sutharsan Mahendren, Tharindu Fernando, S. Sridharan, Peyman Moghadam, C. Fookes","doi":"10.1109/DICTA52665.2021.9647153","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647153","url":null,"abstract":"Motivated by the power of the contrastive learning process, in this paper we present a novel supervised contrastive learning network add-on which reduces the misclassifications of the state-of-the-art Hyper Spectral Image (HSI) classification models. We observe that a significant number of misclassification of these HSI classification models occur at the class borders where there exist multiple different classes in the neighbourhood. We believe this is due to the contamination of feature space in the deeper layers of the CNN network. To mitigate this deficiency we propose a novel supervisory signal design that ‘pulls' the features derived from the same class as of class of the centre pixel together, while ‘pushing’ the features of other classes far apart. This yields a novel trainable neural network module for Reducing Feature Contamination (RFC). The proposed module architecture is model agnostic and can be coupled with different CNN based architectures where it is required to alleviate the contamination of spectral signatures from neighbouring pixels of other classes. Through extensive evaluations using the state-of-the-art SSRN, HybridSN, A2S2K-ResNet and WHU-Hi, Indian Pines and the PaviaU datasets we have demonstrated the utility of the proposed RFC module.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129618051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chuong H. Nguyen, Dadong Wang, Karl von Richter, P. Valencia, F. Alvarenga, G. Bishop-Hurley
{"title":"Video-based cattle identification and action recognition","authors":"Chuong H. Nguyen, Dadong Wang, Karl von Richter, P. Valencia, F. Alvarenga, G. Bishop-Hurley","doi":"10.1109/DICTA52665.2021.9647417","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647417","url":null,"abstract":"We demonstrate a working prototype for the monitoring of cow welfare by automatically analysing the animal behaviours. Deep learning models have been developed and tested with videos acquired in a farm, and a precision of 81.2% has been achieved for cow identification. An accuracy of 84.4% has been achieved for the detection of drinking events, and 94.4% for the detection of grazing events. Experimental results show that the proposed deep learning method can be used to identify the behaviours of individual animals to enable automated farm provenance. Our raw and ground-truth dataset will be released as the first public video dataset for cow identification and action recognition. Recommendations for further development are also provided.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114624482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yajie Sun, Miaohua Zhang, Xiaohan Yu, Yi Liao, Yongsheng Gao
{"title":"A Compositional Feature Embedding and Similarity Metric for Ultra-Fine-Grained Visual Categorization","authors":"Yajie Sun, Miaohua Zhang, Xiaohan Yu, Yi Liao, Yongsheng Gao","doi":"10.1109/DICTA52665.2021.9647081","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647081","url":null,"abstract":"Fine-grained visual categorization (FGVC), which aims at classifying objects with small inter-class variances, has been significantly advanced in recent years. However, ultra-fine-grained visual categorization (ultra-FGVC), which targets at identifying subclasses with extremely similar patterns, has not received much attention. In ultra-FGVC datasets, the samples per category are always scarce as the granularity moves down, which will lead to overfitting problems. Moreover, the difference among different categories is too subtle to distinguish even for professional experts. Motivated by these issues, this paper proposes a novel compositional feature embedding and similarity metric (CECS). Specifically, in the compositional feature embedding module, we randomly select patches in the original input image, and these patches are then replaced by patches from the images of different categories or masked out. Then the replaced and masked images are used to augment the original input images, which can provide more diverse samples and thus largely alleviate overfitting problem resulted from limited training samples. Besides, learning with diverse samples forces the model to learn not only the most discriminative features but also other informative features in remaining regions, enhancing the generalization and robustness of the model. In the compositional similarity metric module, a new similarity metric is developed to improve the classification performance by narrowing the intra-category distance and enlarging the inter-category distance. Experimental results on two ultra-FGVC datasets and one FGVC dataset with recent benchmark methods consistently demonstrate that the proposed CECS method achieves the state-of-the-art performance.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"176 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116892875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mask-Guided Feature Extraction and Augmentation for Ultra-Fine-Grained Visual Categorization","authors":"Zicheng Pan, Xiaohan Yu, Miaohua Zhang, Yongsheng Gao","doi":"10.1109/DICTA52665.2021.9647389","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647389","url":null,"abstract":"While the fine-grained visual categorization (FGVC) problems have been greatly developed in the past years, the Ultra-fine-grained visual categorization (Ultra-FGVC) problems have been understudied. FGVC aims at classifying objects from the same species (very similar categories), while the Ultra-FGVC targets at more challenging problems of classifying images at an ultra-fine granularity where even human experts may fail to identify the visual difference. The challenges for Ultra-FGVC mainly come from two aspects: one is that the Ultra-FGVC often arises overfitting problems due to the lack of training samples; and another lies in that the inter-class variance among images is much smaller than normal FGVC tasks, which makes it difficult to learn discriminative features for each class. To solve these challenges, a mask-guided feature extraction and feature augmentation method is proposed in this paper to extract discriminative and informative regions of images which are then used to augment the original feature map. The advantage of the proposed method is that the feature detection and extraction model only requires a small amount of target region samples with bounding boxes for training, then it can automatically locate the target area for a large number of images in the dataset at a high detection accuracy. Experimental results on two public datasets and ten state-of-the-art benchmark methods consistently demonstrate the effectiveness of the proposed method both visually and quantitatively.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116854780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Resetting the baseline: CT-based COVID-19 diagnosis with Deep Transfer Learning is not as accurate as widely thought","authors":"F. Altaf, S. M. Islam, Naveed Akhtar","doi":"10.1109/DICTA52665.2021.9647158","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647158","url":null,"abstract":"Deep learning is gaining instant popularity in computer aided diagnosis of COVID-19. Due to the high sensitivity of Computed Tomography (CT) to this disease, CT-based COVID-19 detection with visual models is currently at the forefront of medical imaging research. Outcomes published in this direction are frequently claiming highly accurate detection under deep transfer learning. This is leading medical technologists to believe that deep transfer learning is the mainstream solution for the problem. However, our critical analysis of the literature reveals an alarming performance disparity between different published results. Hence, we conduct a systematic thorough investigation to analyze the effectiveness of deep transfer learning for COVID-19 detection with CT images. Exploring 14 state-of-the-art visual models with over 200 model training sessions, we conclusively establish that the published literature is frequently overestimating transfer learning performance for the problem, even in the prestigious scientific sources. The roots of overestimation trace back to inappropriate data curation. We also provide case studies that consider more realistic scenarios, and establish transparent baselines for the problem. We hope that our reproducible investigation will help in curbing hype-driven claims for the critical problem of COVID-19 diagnosis, and pave the way for a more transparent performance evaluation of techniques for CT-based COVID-19 detection.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129541098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}