D. Konovalov, Simindokht Jahangard, L. Schwarzkopf
{"title":"In Situ Cane Toad Recognition","authors":"D. Konovalov, Simindokht Jahangard, L. Schwarzkopf","doi":"10.1109/DICTA.2018.8615780","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615780","url":null,"abstract":"Cane toads are invasive, toxic to native predators, compete with native insectivores, and have a devastating impact on Australian ecosystems, prompting the Australian government to list toads as a key threatening process under the Environment Protection and Biodiversity Conservation Act 1999. Mechanical cane toad traps could be made more native-fauna friendly if they could distinguish invasive cane toads from native species. Here we designed and trained a Convolution Neural Network (CNN) starting from the Xception CNN. The XToadGmp toad-recognition CNN we developed was trained end-to-end using heat-map Gaussian targets. After training, XToadGmp required minimum image pre/post-processing and when tested on 720×1280 shaped images, it achieved 97.1% classification accuracy on 1863 toad and 2892 not-toad test images, which were not used in training.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131232544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Left Ventricle Volume Measuring using Echocardiography Sequences","authors":"Yi Guo, S. Green, L. Park, Lauren Rispen","doi":"10.1109/DICTA.2018.8615766","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615766","url":null,"abstract":"Measuring left ventricle (LV) volume is a challenging problem in physiological study. One of the non-intrusive methods that is possible for this task is echocardiography. By extracting left ventricle area from ultrasound images, the volume can be approximated by the size of the left ventricle area. The core of the problem becomes the identification of the left ventricle in noisy images considering spatial temporal information. We propose adaptive sparse smoothing for left ventricle segmentation for each frame in echocardiography video for the benefit of robustness against strong speckle noise in ultrasound imagery. Then we adjust the identified left ventricle areas (as curves in polar coordinate system) further by a fixed rank principal component analysis as post processing. This method is tested on two data sets with labelled left ventricle areas for some frames by expert physiologist and compared against active contour based method. The experimental results show clearly that the proposed method has better accuracy than that of the competitor.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122182999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meng Yang, Lida Rashidi, A. S. Rao, S. Rajasegarar, Mohadeseh Ganji, M. Palaniswami, C. Leckie
{"title":"Cluster-Based Crowd Movement Behavior Detection","authors":"Meng Yang, Lida Rashidi, A. S. Rao, S. Rajasegarar, Mohadeseh Ganji, M. Palaniswami, C. Leckie","doi":"10.1109/DICTA.2018.8615809","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615809","url":null,"abstract":"Crowd behaviour monitoring and prediction is an important research topic in video surveillance that has gained increasing attention. In this paper, we propose a novel architecture for crowd event detection, which comprises methods for object detection, clustering of various groups of objects, characterizing the movement patterns of the various groups of objects, detecting group events, and finding the change point of group events. In our proposed framework, we use clusters to represent the groups of objects/people present in the scene. We then extract the movement patterns of the various groups of objects over the video sequence to detect movement patterns. We define several crowd events and propose a methodology to detect the change point of the group events over time. We evaluated our scheme using six video sequences from benchmark datasets, which include events such as walking, running, global merging, local merging, global splitting and local splitting. We compared our scheme with state of the art methods and showed the superiority of our method in accurately detecting the crowd behavioral changes.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124800550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Similar Gesture Recognition using Hierarchical Classification Approach in RGB Videos","authors":"Di Wu, N. Sharma, M. Blumenstein","doi":"10.1109/DICTA.2018.8615804","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615804","url":null,"abstract":"Recognizing human actions from the video streams has become one of the very popular research areas in computer vision and deep learning in the recent years. Action recognition is wildly used in different scenarios in real life, such as surveillance, robotics, healthcare, video indexing and human-computer interaction. The challenges and complexity involved in developing a video-based human action recognition system are manifold. In particular, recognizing actions with similar gestures and describing complex actions is a very challenging problem. To address these issues, we study the problem of classifying human actions using Convolutional Neural Networks (CNN) and develop a hierarchical 3DCNN architecture for similar gesture recognition. The proposed model firstly combines similar gesture pairs into one class, and classify them along with all other class, as a stage-1 classification. In stage-2, similar gesture pairs are classified individually, which reduces the problem to binary classification. We apply and evaluate the developed models to recognize the similar human actions on the HMDB51 dataset. The result shows that the proposed model can achieve high performance in comparison to the state-of-the-art methods.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"204 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131556502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Heuristic Evaluations of Cultural Heritage Websites","authors":"Duyen Lam, Atul Sajjanhar","doi":"10.1109/DICTA.2018.8615847","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615847","url":null,"abstract":"Heuristic evaluation, a systematic inspection, aims to find the usability problems in websites. Numerous sets of usability heuristics have been adopted for specific fields through the examination and the judgment of evaluators. Cultural heritage has drawn significant interest and needs thorough investigations in order to improve the interfaces of websites and help to promote cultural values of a country. An in-deep review of literature on user interface evaluations about cultural heritage is presented. We examine several aspects including cultural dimensions in interface design, cultural-based adaptive web design, and technologies for cultural heritage websites' interfaces. The findings are expected to be a foundation in designing archiving websites in the domain of cultural heritage.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132340546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sui Paul Ang, S. L. Phung, M. Schira, A. Bouzerdoum, S. T. Duong
{"title":"Human Brain Tissue Segmentation in fMRI using Deep Long-Term Recurrent Convolutional Network","authors":"Sui Paul Ang, S. L. Phung, M. Schira, A. Bouzerdoum, S. T. Duong","doi":"10.1109/DICTA.2018.8615850","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615850","url":null,"abstract":"Accurate segmentation of different brain tissue types is an important step in the study of neuronal activities using functional magnetic resonance imaging (fMRI). Traditionally, due to the low spatial resolution of fMRI data and the absence of an automated segmentation approach, human experts often resort to superimposing fMRI data on high resolution structural MRI images for analysis. The recent advent of fMRI with higher spatial resolutions offers a new possibility of differentiating brain tissues by their spatio-temporal characteristics, without relying on the structural MRI images. In this paper, we propose a patch-wise deep learning method for segmenting human brain tissues into five types, which are gray matter, white matter, blood vessel, non-brain and cerebrospinal fluid. The proposed method achieves a classification rate of 84.04% and a Dice similarity coefficient of 76.99%, which exceed those by several other methods.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120974070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Table Detection in Document Images using Foreground and Background Features","authors":"Saman Arif, F. Shafait","doi":"10.1109/DICTA.2018.8615795","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615795","url":null,"abstract":"Table detection is an important step in many document analysis systems. It is a difficult problem due to the variety of table layouts, encoding techniques and the similarity of tabular regions with non-tabular document elements. Earlier approaches of table detection are based on heuristic rules or require additional PDF metadata. Recently proposed methods based on machine learning have shown good results. This paper demonstrates performance improvement to these table detection techniques. The proposed solution is based on the observation that tables tend to contain more numeric data and hence it applies color coding/coloration as a signal for telling apart numeric and textual data. Deep learning based Faster R-CNN is used for detection of tabular regions from document images. To gauge the performance of our proposed solution, publicly available UNLV dataset is used. Performance measures indicate improvement when compared with best in-class strategies.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"447 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115606782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kun Zhao, Yuliang Tang, Teng Zhang, J. Carvajal, Daniel F. Smith, A. Wiliem, Peter Hobson, A. Jennings, B. Lovell
{"title":"DGDI: A Dataset for Detecting Glomeruli on Renal Direct Immunofluorescence","authors":"Kun Zhao, Yuliang Tang, Teng Zhang, J. Carvajal, Daniel F. Smith, A. Wiliem, Peter Hobson, A. Jennings, B. Lovell","doi":"10.1109/DICTA.2018.8615769","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615769","url":null,"abstract":"With the growing popularity of whole slide scanners, there is a high demand to develop computer aided diagnostic techniques for this new digitized pathology data. The ability to extract effective information from digital slides, which serve as fundamental representations of the prognostic data patterns or structures, provides promising opportunities to improve the accuracy of automatic disease diagnosis. The recent advances in computer vision have shown that Convolutional Neural Networks (CNNs) can be used to analyze digitized pathology images providing more consistent and objective information to the pathologists. In this paper, to advance the progress in developing computer aided diagnosis systems for renal direct immunofluorescence test, we introduce a new benchmark dataset for Detecting Glomeruli on renal Direct Immunofluorescence (DGDI). To build the baselines, we investigate various CNN-based detectors on DGDI. Experiments demonstrate that DGDI well represents the challenges of renal direct immunofluorescence image analysis and encourages the progress in developing new approaches for understanding renal disease.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121888617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Size-Invariant Attention Accuracy Metric for Image Captioning with High-Resolution Residual Attention","authors":"Zongjian Zhang, Qiang Wu, Yang Wang, Fang Chen","doi":"10.1109/DICTA.2018.8615788","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615788","url":null,"abstract":"Spatial visual attention mechanisms have achieved significant performance improvements for image captioning. To quantitatively evaluate the performances of attention mechanisms, the \"attention correctness\" metric has been proposed to calculate the sum of attention weights generated for ground truth regions. However, this metric cannot consistently measure the attention accuracy among the element regions with large size variance. Moreover, its evaluations are inconsistent with captioning performances across different fine-grained attention resolutions. To address these problems, this paper proposes a size-invariant evaluation metric by normalizing the \"attention correctness\" metric with the size percentage of the attended region. To demonstrate the efficiency of our size-invariant metric, this paper further proposes a high-resolution residual attention model that uses RefineNet as the Fully Convolutional Network (FCN) encoder. By using the COCO-Stuff dataset, we can achieve pixel-level evaluations on both object and \"stuff\" regions. We use our metric to evaluate the proposed attention model across four high fine-grained resolutions (i.e., 27×27, 40×40, 60×60, 80×80). The results demonstrate that, compared with the \"attention correctness\" metric, our size-invariant metric is more consistent with the captioning performances and is more efficient for evaluating the attention accuracy.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124269994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}