Thien Huynh-The, Cam-Hao Hua, Nguyen Anh Tu, Dong-Seong Kim
{"title":"Space-Time Skeletal Analysis with Jointly Dual-Stream ConvNet for Action Recognition","authors":"Thien Huynh-The, Cam-Hao Hua, Nguyen Anh Tu, Dong-Seong Kim","doi":"10.1109/DICTA51227.2020.9363422","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363422","url":null,"abstract":"In this decade, although numerous conventional methods have been introduced for three-dimensional (3D) skeleton-based human action recognition, they have posed a primary limitation of learning a vulnerable recognition model from low-level handcrafted features. This paper proposes an effective deep convolutional neural network (CNN) with a dual-stream architecture to simultaneously learn the geometric-based static pose and dynamic motion features for high-performance action recognition. Each stream consists of several advanced blocks of regular and grouped convolutional layers, wherein various kernel sizes are configured to enrich representational features. Remarkably, the blocks in each stream are associated via skip-connection scheme to overcome the vanishing gradient problem, meanwhile, the blocks of two stream are jointly connected via a customized layer to partly share high-relevant knowledge gained during the model training process. In the experiments, the action recognition method is intensively evaluated on the NTU RGB+D dataset and its upgraded version with up to 120 action classes, where the proposed CNN achieves a competitive performance in terms of accuracy and complexity compared to several other deep models.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130070574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Nguyen, Cong Hau Le, D. V. Sang, Tingting Yao, Wei Li, Zhiyong Wang
{"title":"Efficient Brain Tumor Segmentation with Dilated Multi-fiber Network and Weighted Bi-directional Feature Pyramid Network","authors":"T. Nguyen, Cong Hau Le, D. V. Sang, Tingting Yao, Wei Li, Zhiyong Wang","doi":"10.1109/DICTA51227.2020.9363380","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363380","url":null,"abstract":"Brain tumor segmentation is critical for precise diagnosis and personalised treatment of brain cancer. Due to the recent success of deep learning, many deep learning based segmentation methods have been developed. However, most of them are computationally expensive due to complicated network architectures. Recently, multi-fiber networks were proposed to reduce the number of network parameters in U-Net based brain tumor segmentation through efficient graph convolution. However, the efficient use of multi-scale features has not been well explored between contracting and expanding paths except simple concatenation. In this paper, we propose a light-weight network where contracting and expanding paths are connected with fused multi-scale features through bi-directional feature pyramid network (BiFPN). The backbone of our proposed network has a dilated multi-fiber (DMF) structure based U-net architecture. First, conventional convolutional layers along the contracting and expanding paths are replaced with a DMF network and an MF network, respectively, to reduce the overall network size. In addition, a learnable weighted DMF network is utilized to take into account different receptive sizes effectively. Next, a weighted BiFPN is utilized to connect contracting and expanding paths, which enables more effective and efficient information flow between the two paths with multi-scale features. Note that the BiFPN block can be repeated as necessary. As a result, our proposed network is able to further reduce the network size without clearly compromising segmentation accuracy. Experimental results on the popular BraTS 2018 dataset demonstrate that our proposed light-weight architecture is able to achieve at least comparable results with the state-of-the-art methods with significantly reduced network complexity and computation time. The source code of this paper will be available at Github.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122419978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prabin Kumar Rath, A. Ramirez-Serrano, D. K. Pratihar
{"title":"Moving object detection for humanoid navigation in cluttered dynamic indoor environments using a confidence tracking approach","authors":"Prabin Kumar Rath, A. Ramirez-Serrano, D. K. Pratihar","doi":"10.1109/DICTA51227.2020.9363413","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363413","url":null,"abstract":"Humanoid robot perception is challenging compared to perception in other robotic systems. The sensors in a humanoid are in constant state of motion and their pose estimation is affected by the constant motion of the tens of DOFs (Degrees of Freedom) which in turn affect the estimation of the sensed environmental objects. This is especially problematic in highly cluttered dynamic spaces such as indoor office environments. One of the challenges is identifying the presence of all independent moving/dynamic entities such as people walking around the robot. If available, such information would assist humanoids to build better maps and better plan their motions in unstructured confined dynamic environments. This paper presents a moving object detection pipeline based on relative motion and a novel confidence tracking approach that detects point clusters corresponding to independent moving entities around the robot. The detection does not depend on prior knowledge about the target entity. A ground plane removal tool based on voxel grid covariance is used for separating point clusters of objects within the environment. The proposed method was tested using a Velodyne VLP-16 LiDAR and an Intel-T265 IMU mounted on a gimbal-stabilized humanoid head. The experiments show promising results with a real-time computational time complexity.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"175 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126943709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wencheng Yang, Song Wang, Kan Yu, James Jin Kang, Michael N. Johnstone
{"title":"Secure Fingerprint Authentication with Homomorphic Encryption","authors":"Wencheng Yang, Song Wang, Kan Yu, James Jin Kang, Michael N. Johnstone","doi":"10.1109/DICTA51227.2020.9363426","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363426","url":null,"abstract":"Biometric-based authentication has come into recent prevalence in competition to traditional password- and/or token-based authentication in many applications, both for user convenience and the stability/uniqueness of biometric traits. However, biometric template data, uniquely linking to a user's identity, are considered to be sensitive information. Therefore, it should be secured to prevent privacy leakage. In this paper, we propose a homomorphic encryption-based fingerprint authentication system to provide access control, while protecting sensitive biometric template data. Using homomorphic encryption, matching of biometric data can be performed in the encrypted domain, increasing the difficulty for attackers to obtain the original biometric template without knowing the private key. Moreover, the trade-off between the computational overload and authentication accuracy is studied and experimentally verified on a publicly available fingerprint database, FVC2002 DB2.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130707903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"W-A net: Leveraging Atrous and Deformable Convolutions for Efficient Text Detection","authors":"Sukhad Anand, Z. Khan","doi":"10.1109/DICTA51227.2020.9363428","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363428","url":null,"abstract":"Scene text detection has been gaining a lots of focus in research. Even though the recent methods are able to detect text in complex background having complex shapes with a fairly good accuracy, they still suffer from issues of limited receptive field. These fail from detecting extremely short or long words hence failing in detecting text words precisely in document text images. We propose a new model which we call W-A net, because of it's W shape with the middle branch being Atrous convolutional layers. Our model predicts a segmentation map which divides the image into word and no word regions and also, a boundary map which helps to segregate closer words from each other. We use Atrous convolutions and Deformable convolutional layers to increase the receptive field which helps to detect long words in an image. We treat text detection problem as a single problem irrespective of the background, making our model suitable of detecting text in scene or document images. We present our findings on two scene text datasets and a receipt dataset. Our results show that our method performs better than recent scene text detection methods which perform poorly on document text images, especially receipt images with short words.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124874217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3D Reconstruction and Object Detection for HoloLens","authors":"Zequn Wu, Tianhao Zhao, Chuong V. Nguyen","doi":"10.1109/DICTA51227.2020.9363378","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363378","url":null,"abstract":"Current smart glasses such as HoloLens excel at positioning within the physical environment, however object and task recognition are still relatively primitive. We aim to expand the available benefits of MR/AR systems by using semantic object recognition and 3D reconstruction. Particularly in this preliminary study, we successfully use a HoloLens to build 3D maps, recognise and count objects in a working environment. This is achieved by offloading these computationally expensive tasks to a remote GPU server. To further achieve realtime feedback and parallelise tasks, object detection is performed on 2D images and mapped to 3D reconstructed space. Fusion of multiple views of 2D detection is additionally performed to refine 3D object bounding boxes and separate nearby objects.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121953822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vojtech Bartl, Roman Juránek, Jakub Špaňhel, A. Herout
{"title":"PlaneCalib: Automatic Camera Calibration by Multiple Observations of Rigid Objects on Plane","authors":"Vojtech Bartl, Roman Juránek, Jakub Špaňhel, A. Herout","doi":"10.1109/DICTA51227.2020.9363417","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363417","url":null,"abstract":"In this work, we propose a novel method for automatic camera calibration, mainly for surveillance cameras. The calibration consists in observing objects on the ground plane of the scene; in our experiments, vehicles were used. However, any arbitrary rigid objects can be used instead, as verified by experiments with synthetic data. The calibration process uses convolutional neural network localisation of landmarks on the observed objects in the scene and the corresponding 3D positions of the localised landmarks � thus fine-grained classification of the detected vehicles in the image plane is done. The observation of the objects (detection, classification and landmark detection) enables to determine all typically used camera calibration parameters (focal length, rotation matrix, and translation vector). The experiments with real data show slightly better results in comparison with state-of-the-art work, however with an extreme speed-up. The calibration error decreased from 3.01% to 2.72% and 1223 � faster computation was achieved.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126807944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Survey on Training Free 3D Texture-less Object Recognition Techniques","authors":"Piyush Joshi, Alireza Rastegarpanah, R. Stolkin","doi":"10.1109/DICTA51227.2020.9363389","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363389","url":null,"abstract":"Local surface feature based 3D object recognition is a rapidly growing research field. In time-critical applications such as robotics, training free recognition techniques are always the first choice as they are free from heavy statistical training. This paper presents an experimental analysis of 3D texture-less object recognition techniques that are free from any training. To our best knowledge, this is the first survey that includes experimental evaluation of top-rated training free recognition techniques on the datasets acquired by an RGBD camera. Based on the experimentation, we briefly present a discussion on potential future research directions.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122341492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pixel-RRT*: A Novel Skeleton Trajectory Search Algorithm for Hepatic Vessels","authors":"Jianfeng Zhang, Wanru Chang, Fa Wu, D. Kong","doi":"10.1109/DICTA51227.2020.9363424","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363424","url":null,"abstract":"In the clinical treatment of liver disease such as tumor, the acquisition of vascular skeleton trajectory is of great worth to untangle the basin and venation of hepatic vessels, because tumor and vessels are closely intertwined. In most cases, skeletonization based on the results of vascular segmentation will be prone to fracture due to the discontinuous segmenting results of vessels. As the overall tree-like system of hepatic vessels is a thin tubular tissue, we expect to start the analysis of vessels from vascular skeleton to vascular boundary, not the contrary, which can more effectively implement the image computing of hepatic vessels and interpret the tree-like expansion. To this issue, in this paper, we propose an innovative approach Pixel-RRT* inspired by Marray's Law and the growing rule of biological vasculature. It can be applied to the skeleton trajectory search for the intricate hepatic vessels. In Pixel-RRT*, we introduce the novel pixel-based cost function, the design of pixel-distributed random sampling, and a multi-goal strategy in the shared graph of random tree based on the general algorithmic framework of RRT* and RRT. Without any prior segmentation of the vessels, the proposed Pixel-RRT* can rapidly return the rationally bifurcated vascular trajectories satisfying the principle of minimal energy and topological continuity. In addition, we put forward an adaptively interpolated variational method as the postprocessing technique to make the vascular trajectory smoother by the means of energy minimization. The simulation experiments and examples of hepatic vessels demonstrate our method is efficient and utilisable. The codes will be made available at https://github.com/JeffJFZ/Pixel-RRTStar.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"17 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113968632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chau D. M. Nguyen, S. Z. Gilani, S. Islam, D. Suter
{"title":"Learning Affordance Segmentation: An Investigative Study","authors":"Chau D. M. Nguyen, S. Z. Gilani, S. Islam, D. Suter","doi":"10.1109/DICTA51227.2020.9363390","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363390","url":null,"abstract":"Affordance segmentation aims at recognising, localising and segmenting affordances from images, enabling scene understanding of visual content in many applications in robotic perception. Supervised learning with deep networks has become very popular in affordance segmentation. However, very few studies have investigated the factors that contribute to improved learning of affordances. This investigation is essential to improve precision and balance cost-efficiency when learning affordance segmentation. In this paper, we address this task and identify two prime factors affecting precision of learning affordance segmentation: (1) The quality of features extracted from the classification module and (2) the dearth of information in the Region Proposal Network (RPN). Consequently, we replace the backbone classification model and introduce a novel multiple alignment strategy in the RPN. Our results obtained through extensive experimentation validate our contributions and outperform the state-of-the-art affordance segmentation models.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124333904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}