D. Konovalov, Simindokht Jahangard, L. Schwarzkopf
{"title":"In Situ Cane Toad Recognition","authors":"D. Konovalov, Simindokht Jahangard, L. Schwarzkopf","doi":"10.1109/DICTA.2018.8615780","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615780","url":null,"abstract":"Cane toads are invasive, toxic to native predators, compete with native insectivores, and have a devastating impact on Australian ecosystems, prompting the Australian government to list toads as a key threatening process under the Environment Protection and Biodiversity Conservation Act 1999. Mechanical cane toad traps could be made more native-fauna friendly if they could distinguish invasive cane toads from native species. Here we designed and trained a Convolution Neural Network (CNN) starting from the Xception CNN. The XToadGmp toad-recognition CNN we developed was trained end-to-end using heat-map Gaussian targets. After training, XToadGmp required minimum image pre/post-processing and when tested on 720×1280 shaped images, it achieved 97.1% classification accuracy on 1863 toad and 2892 not-toad test images, which were not used in training.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131232544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Table Detection in Document Images using Foreground and Background Features","authors":"Saman Arif, F. Shafait","doi":"10.1109/DICTA.2018.8615795","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615795","url":null,"abstract":"Table detection is an important step in many document analysis systems. It is a difficult problem due to the variety of table layouts, encoding techniques and the similarity of tabular regions with non-tabular document elements. Earlier approaches of table detection are based on heuristic rules or require additional PDF metadata. Recently proposed methods based on machine learning have shown good results. This paper demonstrates performance improvement to these table detection techniques. The proposed solution is based on the observation that tables tend to contain more numeric data and hence it applies color coding/coloration as a signal for telling apart numeric and textual data. Deep learning based Faster R-CNN is used for detection of tabular regions from document images. To gauge the performance of our proposed solution, publicly available UNLV dataset is used. Performance measures indicate improvement when compared with best in-class strategies.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"447 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115606782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kun Zhao, Yuliang Tang, Teng Zhang, J. Carvajal, Daniel F. Smith, A. Wiliem, Peter Hobson, A. Jennings, B. Lovell
{"title":"DGDI: A Dataset for Detecting Glomeruli on Renal Direct Immunofluorescence","authors":"Kun Zhao, Yuliang Tang, Teng Zhang, J. Carvajal, Daniel F. Smith, A. Wiliem, Peter Hobson, A. Jennings, B. Lovell","doi":"10.1109/DICTA.2018.8615769","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615769","url":null,"abstract":"With the growing popularity of whole slide scanners, there is a high demand to develop computer aided diagnostic techniques for this new digitized pathology data. The ability to extract effective information from digital slides, which serve as fundamental representations of the prognostic data patterns or structures, provides promising opportunities to improve the accuracy of automatic disease diagnosis. The recent advances in computer vision have shown that Convolutional Neural Networks (CNNs) can be used to analyze digitized pathology images providing more consistent and objective information to the pathologists. In this paper, to advance the progress in developing computer aided diagnosis systems for renal direct immunofluorescence test, we introduce a new benchmark dataset for Detecting Glomeruli on renal Direct Immunofluorescence (DGDI). To build the baselines, we investigate various CNN-based detectors on DGDI. Experiments demonstrate that DGDI well represents the challenges of renal direct immunofluorescence image analysis and encourages the progress in developing new approaches for understanding renal disease.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121888617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Left Ventricle Volume Measuring using Echocardiography Sequences","authors":"Yi Guo, S. Green, L. Park, Lauren Rispen","doi":"10.1109/DICTA.2018.8615766","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615766","url":null,"abstract":"Measuring left ventricle (LV) volume is a challenging problem in physiological study. One of the non-intrusive methods that is possible for this task is echocardiography. By extracting left ventricle area from ultrasound images, the volume can be approximated by the size of the left ventricle area. The core of the problem becomes the identification of the left ventricle in noisy images considering spatial temporal information. We propose adaptive sparse smoothing for left ventricle segmentation for each frame in echocardiography video for the benefit of robustness against strong speckle noise in ultrasound imagery. Then we adjust the identified left ventricle areas (as curves in polar coordinate system) further by a fixed rank principal component analysis as post processing. This method is tested on two data sets with labelled left ventricle areas for some frames by expert physiologist and compared against active contour based method. The experimental results show clearly that the proposed method has better accuracy than that of the competitor.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122182999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Convolutional 3D Attention Network for Video Based Freezing of Gait Recognition","authors":"Renfei Sun, Zhiyong Wang, K. E. Martens, S. Lewis","doi":"10.1109/DICTA.2018.8615791","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615791","url":null,"abstract":"Freezing of gait (FoG) is defined as a brief, episodic absence or marked reduction of forward progression of the feet despite the intention to walk. It is a typical symptom of Parkinson's disease (PD) and has a significant impact on the life quality of PD patients. Generally trained experts need to review the gait of a patient for clinical diagnosis, which is time consuming and subjective. Nowadays, automatic FoG identification from videos provides a promising solution to address these issues by formulating FoG identification as a human action recognition task. However, most existing human action recognition algorithms are limited in this task as FoG is very subtle and can be easily overlooked when being interfered with by irrelevant motion. In this paper, we propose a novel action recognition algorithm, namely convolutional 3D attention network (C3DAN), to address this issue by learning an informative region for more effective recognition. The network consists of two main parts: Spatial Attention Network (SAN) and 3-dimensional convolutional network (C3D). SAN aims to generate an attention region from coarse to fine, while C3D extracts discriminative features. Our proposed approach is able to localize attention region without manual annotation and to extract discriminative features in an end-to-end way. We evaluate our proposed C3DAN method on a video dataset collected from 45 PD patients in a clinical setting for the quantification of FoG in PD. We obtained sensitivity of 68.2%, specificity of 80.8% and accuracy of 79.3%, which outperformed several state-of-the-art human action recognition methods. To the best of our knowledge, our work is one of the first studies detecting FoG from clinical videos.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122837838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Size-Invariant Attention Accuracy Metric for Image Captioning with High-Resolution Residual Attention","authors":"Zongjian Zhang, Qiang Wu, Yang Wang, Fang Chen","doi":"10.1109/DICTA.2018.8615788","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615788","url":null,"abstract":"Spatial visual attention mechanisms have achieved significant performance improvements for image captioning. To quantitatively evaluate the performances of attention mechanisms, the \"attention correctness\" metric has been proposed to calculate the sum of attention weights generated for ground truth regions. However, this metric cannot consistently measure the attention accuracy among the element regions with large size variance. Moreover, its evaluations are inconsistent with captioning performances across different fine-grained attention resolutions. To address these problems, this paper proposes a size-invariant evaluation metric by normalizing the \"attention correctness\" metric with the size percentage of the attended region. To demonstrate the efficiency of our size-invariant metric, this paper further proposes a high-resolution residual attention model that uses RefineNet as the Fully Convolutional Network (FCN) encoder. By using the COCO-Stuff dataset, we can achieve pixel-level evaluations on both object and \"stuff\" regions. We use our metric to evaluate the proposed attention model across four high fine-grained resolutions (i.e., 27×27, 40×40, 60×60, 80×80). The results demonstrate that, compared with the \"attention correctness\" metric, our size-invariant metric is more consistent with the captioning performances and is more efficient for evaluating the attention accuracy.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124269994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image Enhancement for Face Recognition in Adverse Environments","authors":"D. Kamenetsky, Sau Yee Yiu, Martyn Hole","doi":"10.1109/DICTA.2018.8615793","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615793","url":null,"abstract":"Face recognition in adverse environments, such as at long distances or in low light conditions, remains a challenging task for current state-of-the-art face matching algorithms. The facial images taken in these conditions are often low resolution and low quality due to the effects of atmospheric turbulence and/or insufficient amount of light reaching the camera. In this work, we use an atmospheric turbulence mitigation algorithm (MPE) to enhance low resolution RGB videos of faces captured either at long distances or in low light conditions. Due to its interactive nature, MPE is tuned to work well in each specific environment. We also propose three image enhancement techniques that further improve the images produced by MPE: two for low light imagery (MPEf and fMPE) and one for long distance imagery (MPEh). Experimental results show that all three methods significantly improve the image quality and face recognition performance, allowing effective face recognition in almost complete darkness (at close range) or at distances up to 200m (in daylight).","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126059532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ana Daysi Ruvalcaba-Cardenas, T. Scoleri, Geoffrey Day
{"title":"Object Classification using Deep Learning on Extremely Low-Resolution Time-of-Flight Data","authors":"Ana Daysi Ruvalcaba-Cardenas, T. Scoleri, Geoffrey Day","doi":"10.1109/DICTA.2018.8615877","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615877","url":null,"abstract":"This paper proposes two novel deep learning models for 2D and 3D classification of objects in extremely low-resolution time-of-flight imagery. The models have been developed to suit contemporary range imaging hardware based on a recently fabricated Single Photon Avalanche Diode (SPAD) camera with 64 χ 64 pixel resolution. Being the first prototype of its kind, only a small data set has been collected so far which makes it challenging for training models. To bypass this hurdle, transfer learning is applied to the widely used VGG-16 convolutional neural network (CNN), with supplementary layers added specifically to handle SPAD data. This classifier and the renowned Faster-RCNN detector offer benchmark models for comparison to a newly created 3D CNN operating on time-of-flight data acquired by the SPAD sensor. Another contribution of this work is the proposed shot noise removal algorithm which is particularly useful to mitigate the camera sensitivity in situations of excessive lighting. Models have been tested in both low-light indoor settings and outdoor daytime conditions, on eight objects exhibiting small physical dimensions, low reflectivity, featureless structures and located at ranges from 25m to 700m. Despite antagonist factors, the proposed 2D model has achieved 95% average precision and recall, with higher accuracy for the 3D model.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128587255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meng Yang, Lida Rashidi, A. S. Rao, S. Rajasegarar, Mohadeseh Ganji, M. Palaniswami, C. Leckie
{"title":"Cluster-Based Crowd Movement Behavior Detection","authors":"Meng Yang, Lida Rashidi, A. S. Rao, S. Rajasegarar, Mohadeseh Ganji, M. Palaniswami, C. Leckie","doi":"10.1109/DICTA.2018.8615809","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615809","url":null,"abstract":"Crowd behaviour monitoring and prediction is an important research topic in video surveillance that has gained increasing attention. In this paper, we propose a novel architecture for crowd event detection, which comprises methods for object detection, clustering of various groups of objects, characterizing the movement patterns of the various groups of objects, detecting group events, and finding the change point of group events. In our proposed framework, we use clusters to represent the groups of objects/people present in the scene. We then extract the movement patterns of the various groups of objects over the video sequence to detect movement patterns. We define several crowd events and propose a methodology to detect the change point of the group events over time. We evaluated our scheme using six video sequences from benchmark datasets, which include events such as walking, running, global merging, local merging, global splitting and local splitting. We compared our scheme with state of the art methods and showed the superiority of our method in accurately detecting the crowd behavioral changes.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124800550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}