{"title":"Accurate Shift Estimation under One-Parameter Geometric Distortion using the Brosc Filter","authors":"P. Fletcher, Matthew R. Arnison, Eric W. Chong","doi":"10.1109/DICTA.2018.8615835","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615835","url":null,"abstract":"Shift estimation is the task of estimating an unknown translation factor which best relates two relatively distorted representations of the same image data. Where distortion is large and also includes rotation and scaling, estimates of the global distortion can be obtained with good accuracy using RST-matching methods, but such algorithms are slow and complicated. Where geometric distortion is small, correlation-based methods can achieve millipixel accuracy. These methods begin to fail, however, when even quite small geometric distortions are present, such as rotation by 1° or 2°, or a scaling by as little as 5%. A new spatially-variant filter, the brosc filter (\"better rotation or scaling\"), can be used to preserve the accuracy of correlation-based shift estimation where the expected distortion can be modelled as a single parameter, for example, as a pure rotation, a pure scaling, or a pure scaling along a known axis. By applying the brosc filter before shift estimation, shift accuracy under geometric distortion is improved, and a variant of the brosc filter using complex arithmetic provides in addition an estimate of the single parameter representing the unknown distortion.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123180746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Descriptor-Driven Keypoint Detection","authors":"A. Sluzek","doi":"10.1109/DICTA.2018.8615841","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615841","url":null,"abstract":"A methodology is proposed (and illustrated on exemplary cases) for detecting keypoints in such a way that usability of those keypoints in image matching tasks can be potentially maximized. Following the approach used for MSER detection, we localize keypoints at image patches for which the selected keypoint descriptor is maximally stable under fluctuations of the parameter(s) (e.g. image threshold, scale, shift, etc.) determining how configurations of those patches evolve. In this way, keypoint descriptors are used in the scenarios where descriptors' volatility due to minor image distortions is minimized and, thus, performances of keypoint matching are prospectively maximized. Experimental verification on selected types of keypoint descriptors fully confirmed this hypothesis. Additionally, a novel concept of semi-dense feature representation of images (based on the proposed methodology) has been preliminarily discussed and illustrated (and its prospective links with deep learning and tracking applications highlighted).","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123244197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Learning Models for Facial Expression Recognition","authors":"Atul Sajjanhar, Zhaoqi Wu, Q. Wen","doi":"10.1109/DICTA.2018.8615843","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615843","url":null,"abstract":"We investigate facial expression recognition using state-of-the-art classification models. Recently, CNNs have been extensively used for face recognition. However, CNNs have not been thoroughly evaluated for facial expression recognition. In this paper, we train and test a CNN model for facial expression recognition. The performance of the CNN model is used as benchmark for evaluating other pre-trained deep CNN models. We evaluate the performance of Inception and VGG which are pre-trained for object recognition, and compare these with VGG-Face which is pre-trained for face recognition. All experiments are performed on publicly available face databases, namely, CK+, JAFFE and FACES.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123739878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wai Y. K. San, Teng Zhang, Shaokang Chen, A. Wiliem, Dario Stefanelli, B. Lovell
{"title":"Early Experience of Depth Estimation on Intricate Objects using Generative Adversarial Networks","authors":"Wai Y. K. San, Teng Zhang, Shaokang Chen, A. Wiliem, Dario Stefanelli, B. Lovell","doi":"10.1109/DICTA.2018.8615783","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615783","url":null,"abstract":"Object parts within a scene observed by the human eye exhibit their own unique depth. Producing a single image with an accurate depth of field has many implications, namely: virtual and augmented reality, mobile robotics, digital photography and medical imaging. In this work, we aim to exploit the effectiveness of conditional Generative Adversarial Networks (GAN) to improve depth estimation from a singular inexpensive monocular sensor camera sensor. The complexity of an object shape, texture and environmental conditions make depth estimations challenging. Our approach is evaluated on our novel depth map dataset we release publicly containing the challenging photo-depth image pairs. Standard evaluation metrics against other depth map estimation techniques demonstrates the effectiveness of our approach. A study of the effectiveness of GAN on different test data is demonstrated both qualitatively and quantitatively.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127048221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A General Approach to Segmentation in CT Grayscale Images using Variable Neighborhood Search","authors":"T. Siriapisith, Worapan Kusakunniran, P. Haddawy","doi":"10.1109/DICTA.2018.8615823","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615823","url":null,"abstract":"Medical image segmentation is essential for several tasks including pre-treatment planning and tumor monitoring. Computed tomography (CT) is the most useful imaging modality for abdominal organs and tumors, with benefits of high imaging resolution and few motion artifacts. Unfortunately, CT images contain only limited information of intensity and gradient, which makes accurate segmentation a challenge. In this paper, we propose a 2D segmentation method that applies the concept of variable neighborhood search (VNS) by iteratively alternating search through intensity and gradient spaces. By alternating between the two search spaces, the technique can escape local minima that occur when segmenting in a single search space. The main techniques used in the proposed framework are graph-cut with probability density function (GCPDF) and graph-cut based active contour (GCBAC). The presented method is quantitatively evaluated on a public clinical dataset, which includes various sizes of liver tumor, kidney and spleen. The segmentation performance is evaluated using dice similarity coefficient (DSC), Jaccard similarity coefficient (JSC), and volume difference (VD). The presented method achieves the outstanding segmentation performance with a DSC of 84.48±5.84%, 76.93±8.24%, 91.70±2.68% and 89.27±5.21%, for large liver tumor, small liver tumor, kidney and spleen, respectively.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"299302 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116576789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenjiao Bian, T. Wakahara, Tao Wu, He Tang, Jirui Lin
{"title":"Binarization of Color Character Strings in Scene Images using Deep Neural Network","authors":"Wenjiao Bian, T. Wakahara, Tao Wu, He Tang, Jirui Lin","doi":"10.1109/DICTA.2018.8615837","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615837","url":null,"abstract":"This paper addresses the problem of binarizing multicolored character strings in scene images with complex backgrounds and heavy image degradations. The proposed method consists of three steps. The first step is combinatorial generation of binarized images via every dichotomization of K clusters obtained by K-means clustering of constituent pixels of an input image in the HSI color space. The second step is classification of each binarized image using deep neural network into two categories: character string and non-character string. The final step is selection of a single binarized image with the highest degree of character string as an optimal binarization result. Experimental results using ICDAR 2003 robust word recognition dataset show that the proposed method achieves a correct binarization rate of 87.4% that is highly competitive with the state of the art of binarization of scene character strings.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123634370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gilbert Eaton, Andrew Busch, Rudi Bartels, Yongsheng Gao
{"title":"Colour Analysis of Strawberries on a Real Time Production Line","authors":"Gilbert Eaton, Andrew Busch, Rudi Bartels, Yongsheng Gao","doi":"10.1109/DICTA.2018.8615779","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615779","url":null,"abstract":"A novel system has been designed where colour analysis algorithms facilitate grading ripeness of packed strawberries on a fast-paced production line. The Strawberry quality system acquires images at the rate of 2punnets/s, and feeds the images to the two algorithms. Using CIELAB and HSV colourspaces, both underripe and overripe colour features are analysed resulting in F1 scores of 94.7% and 90.6% respectively, when measured on multiple defect samples. The single defect class results scored 80.1% and 77.1%. The algorithms total time for the current hardware configuration is 121ms maximum and 80ms average, which is well below the required time window of 500ms. 105, 542 punnets have been assessed by the algorithm and has rejected 4, 952 in total (4.9%), helping to ensure the quality of the product being shipped to customers and avoiding costly returns.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126286450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Abobakr, M. Hossny, Hala Abdelkader, S. Nahavandi
{"title":"RGB-D Fall Detection via Deep Residual Convolutional LSTM Networks","authors":"A. Abobakr, M. Hossny, Hala Abdelkader, S. Nahavandi","doi":"10.1109/DICTA.2018.8615759","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615759","url":null,"abstract":"The development of smart healthcare environments has witnessed impressive advancements exploiting the recent technological capabilities. Since falls are considered a major health concern especially among older adults, low-cost fall detection systems have become an indispensable component in these environments. This paper proposes an integrable, privacy preserving and efficient fall detection system from depth images acquired using a Kinect RGB-D sensor. The proposed system uses an end-to-end deep learning architecture composed of convolutional and recurrent neural networks to detect fall events. The deep convolutional network (ConvNet) analyses the human body and extracts visual features from input sequence frames. Fall events are detected via modeling complex temporal dependencies between subsequent frame features using Long-Shot-Term-Memory (LSTM) recurrent neural networks. Both models are combined and jointly trained in an end-to-end ConvLSTM architecture. This allows the model to learn visual representations and complex temporal dynamics of fall motions simultaneously. The proposed method has been validated on the public URFD fall detection dataset and compared with different approaches, including accelerometer based methods. We achieved a near unity sensitivity and specificity rates in detecting fall events.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121980119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fine-Grained Categorization by Deep Part-Collaboration Convolution Net","authors":"Qiyu Liao, H. Holewa, Min Xu, Dadong Wang","doi":"10.1109/DICTA.2018.8615855","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615855","url":null,"abstract":"In part-based categorization context, the ability to learn representative feature from quantitative tiny object parts is of similar importance as to exactly localize the parts. We propose a new deep net structure for fine-grained categorization that follows the taxonomy workflow, which makes it interpretable and understandable for humans. By training customized sub-nets on each manually annotated parts, we increased the state-of-the-art part-based classification accuracy for general fine-grained CUB-200-2011 dataset by 2.1%. Our study shows the proposed method can produce more activation to discriminate detail part difference while maintaining high computing performance by applying a set of strategies to optimize the deep net structure.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133670599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hand Detection using Deformable Part Models on an Egocentric Perspective","authors":"Sergio R. Cruz, Antoni B. Chan","doi":"10.1109/DICTA.2018.8615781","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615781","url":null,"abstract":"The egocentric perspective is a recent perspective brought by new devices like the GoPro and Google Glass, which are becoming more available to the public. The hands are the most consistent objects in the egocentric perspective and they can represent more information about people and their activities, but the nature of the perspective and the ever changing shape of the hands makes them difficult to detect. Previous work has focused on indoor environments or controlled data since it brings simpler ways to approach it, but in this work we use data with changing background and variable illumination, which makes it more challenging. We use a Deformable Part Model based approach to generate hand proposals since it can handle the many gestures the hand can adopt and rivals other techniques on locating the hands while reducing the number of proposals. We also use the location where the hands appear and size in the image to reduce the number of detections. Finally, a CNN classifier is applied to remove the final false positives to generate the hand detections.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127426397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}