{"title":"View Synthesis with Multi-scale Cost Aggregation and Confidence Prior","authors":"Qi Wu, Xue Wang, Qing Wang","doi":"10.1109/DICTA52665.2021.9647048","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647048","url":null,"abstract":"This paper presents a learning-based novel view synthesis (NVS) approach from wide-baseline image pairs. Inspired by prior work, we first predict a depth probability volume which represents the scene structure as a set of depth probability layers (DPLs) within a reference view frustum. To reduce geometric uncertainty in ambiguous regions between input images, a multi-scale cost aggregation network is proposed to generate the DPLs for both input views without supervision. Furthermore, to mitigate the depth discretizaiton artifacts in distant views, we calculate the disparity map of the target view by passing the warped DPLs onto the target view to a CNN-based fusion network. Finally the predicted view could be obtained by incorporating the disparity map, warped input images and the confidence prior together. The proposed method improves the performance on challenging scenarios such as texture-less or non-textured regions, occlusion boundaries, non-Lambertian surfaces, and distant viewpoints. Experimental results show that our method achieves state-of-the-art view interpolation and extrapolation results on RealEstate10K mini dataset.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116788699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Multi-View DCNN Based Method for Breast Cancer Screening","authors":"N. Derbel, Hedi Tmar, A. Mahfoudhi","doi":"10.1109/DICTA52665.2021.9647297","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647297","url":null,"abstract":"Breast cancer is the most widespread cancer amongst women worldwide for cancer diagnoses in the world. In the first place, screening mammography is widely acknowledged to be the most effective imaging tool which allows to detect breast cancer early. Early detection is particularly associated with a decrease of incidence and mortality rate. However the limiting factors with the use of mammography are i) the fact that mammogram is difficult to interpret due to the high density of breast tissue, ii) the workload of radiologists; the situation is even worse because of the double reading process, and iii) false positive recalls are many times accompanying with needless tests and biopsies. We propose to set up a multi-view based design of a deep convotutional neural network in order to carry out mammography screening task - the fulfilled network can extract distinctive characteristics from Medio-Lateral Oblique (MLO) and Cranial Caudal (CC) mammography's views for each breast (a set of 4 images). We test it on a subset selected from the open Digital Database for Screening Mammography (DDSM) using exams (each exam is composed of 4 images). We demonstrate that our approach can outperforms existing ones both in the matter of prediction accuracy and false positive rate reduction. Our method accomplishes a 97.77% specificity rate and a 98.7% accuracy rate.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129700194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mitchell Hargreaves, David Ting, Stephen Bajan, Kamron Bhavnagri, R. Bassed, Xiaojun Chang
{"title":"A Generative Deep Learning Approach for Forensic Facial Reconstruction","authors":"Mitchell Hargreaves, David Ting, Stephen Bajan, Kamron Bhavnagri, R. Bassed, Xiaojun Chang","doi":"10.1109/DICTA52665.2021.9647290","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647290","url":null,"abstract":"Forensic facial reconstruction currently relies on subjective manual methods to reconstruct a recognizable face from a skull. Automated approaches using algorithms and statistical norms have been able to reliably construct faces in real-time, but are unable to generate areas of the face that do not correlate strongly to the bone beneath, such as the eyes, lips and ears. Recent developments in deep learning have shown that generative models can produce realistic images indistinguishable from genuine human faces. Applying these techniques, we propose a generative deep learning solution to perform facial reconstruction directly from the bone with limited data and no background expertise. The model is trained on 665 3D Computed Tomography (CT) head scans that have been cleaned of noise, rotated to correct orientation and then filtered by density to find bone and soft tissue to be used as input and label respectively. It is trained with a combination of adversarial and VGGFace2 perceptual loss. The model is then compared to two baseline deep generative models and achieves a mIoU of 0.9410 and facial detection score of 88.32%. Results show the model is able to consistently generate accurate jaw and muscle structures. Additionally, it generates realistic ears, eyes and noses - features that are traditionally difficult to generate automatically with traditional techniques. Our model provides the basis for a complete, end-to-end solution for forensic facial reconstruction, with no prior knowledge or training on reconstruction techniques.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126735029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ignacio A. Viedma, D. Alonso-Caneiro, Scott A. Read, M. Collins
{"title":"OCT retinal image-to-image translation: Analysing the use of CycleGAN to improve retinal boundary semantic segmentation","authors":"Ignacio A. Viedma, D. Alonso-Caneiro, Scott A. Read, M. Collins","doi":"10.1109/DICTA52665.2021.9647266","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647266","url":null,"abstract":"Optical coherence tomography (OCT) images of the posterior eye provide valuable clinical information. To quantify these images and extract appropriate biomarkers, methods to segment the different retinal boundaries are needed. In recent years, deep learning methods have been applied to perform this image analysis task, providing state of the art performance. However, these methods can be affected by image variability, particularly if the network is trained with images that do not match in terms of features to those of the testing dataset. One of the common sources of variability in OCT is speckle noise. In this work, the effect of noise on the semantic segmentation process is investigated and the use of a CycleGAN method as an image-to-image translation to reduce noise and its impact on segmentation are assessed. The results show promising performance and a proof of the potential of this generative adversarial network (GAN) method to positively impact medical image segmentation, obtaining good performance results when compared in terms of Dice coefficient overlap and boundary error metrics. The findings of this work may be translated to other applications such as ‘OCT instrument translation’ to create instrument-agnostic segmentation solutions.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125829030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Brain MRI motion artifact reduction using 3D conditional generative adversarial networks on simulated motion","authors":"M. Ghaffari, K. Pawar, R. Oliver","doi":"10.1109/DICTA52665.2021.9647370","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647370","url":null,"abstract":"Magnetic resonance imaging is a well established technique for clinical diagnosis and quantifying disease modifying therapies. Whilst the exquisite tissue contrast sensitivity of MRI is indisputable, the modality can be susceptible to motion artifacts occurring during the acquisition phase. In this work, we propose to use a 3D deep learning network based on conditional generative adversarial networks (cGAN) for retrospective brain MRI motion artifact reduction from T1 weighted (T1-w) images. In order to create ground truth (motion-free) images for training, we selected a large clean dataset from the Human Connectome Project (HCP) of 1200 subjects and simulated motion artifacts to produce motion corrupted data. To evaluate model performance and its generalisability, we tested on 300 unseen motion corrupted images. The model performance was compared with a conventional model (Gaussian smoothing), as well as two state-of-the-art models: a 3D Generic U-net and MoCoNet. Normalized mean squared error (NMSE), mean structural similarity (SSIM), and peak signal-to-noise ratio (PSNR) were used as evaluation metrics. The proposed model outperformed all other models by decreasing NMSE from 0.042 (ground truth vs. image with motion simulation) to 0.032 (ground truth vs. model output: motion reduced), enhancing SSIM from 0.964 to 0.98, and increasing PSNR from 33.43 to 34.23. The promising model performance suggests its potential for being used in clinical setup to enhance the overall visual perception of the 3D T1-w brain Scans after image acquisition.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128250638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Sasdelli, Thalaiyasingam Ajanthan, Tat-Jun Chin, G. Carneiro
{"title":"A Chaos Theory Approach to Understand Neural Network Optimization","authors":"M. Sasdelli, Thalaiyasingam Ajanthan, Tat-Jun Chin, G. Carneiro","doi":"10.1109/DICTA52665.2021.9647143","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647143","url":null,"abstract":"Despite the complicated structure of modern deep neural network architectures, they are still optimized with algorithms based on Stochastic Gradient Descent (SGD). However, the reason behind the effectiveness of SGD is not well understood, making its study an active research area. In this paper, we formulate deep neural network optimization as a dynamical system and show that the rigorous theory developed to study chaotic systems can be useful to understand SGD and its variants. In particular, we first observe that the inverse of the instability timescale of SGD optimization, represented by the largest Lyapunov exponent, corresponds to the most negative eigenvalue of the Hessian of the loss. This observation enables the introduction of an efficient method to estimate the largest eigenvalue of the Hessian. Then, we empirically show that for a large range of learning rates, SGD traverses the loss landscape across regions with largest eigenvalue of the Hessian similar to the inverse of the learning rate. This explains why effective learning rates can be found to be within a large range of values and shows that SGD implicitly uses the largest eigenvalue of the Hessian while traversing the loss landscape. This sheds some light on the effectiveness of SGD over more sophisticated second-order methods. We also propose a quasi-Newton method that dynamically estimates an optimal learning rate for the optimization of deep learning models. We demonstrate that our observations and methods are robust across different architectures and loss functions on CIFAR-10 dataset.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131647408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Bajger, Minh-Son To, Gobert N. Lee, A. Wells, Chee Chong, M. Agzarian, S. Poonnoose
{"title":"Lumbar Spine CT synthesis from MR images using CycleGAN - a preliminary study","authors":"M. Bajger, Minh-Son To, Gobert N. Lee, A. Wells, Chee Chong, M. Agzarian, S. Poonnoose","doi":"10.1109/DICTA52665.2021.9647237","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647237","url":null,"abstract":"In this paper, we investigate the generation of Lumbar Spine synthetic CT (sCT) images based on MR images for MR-only spinal cord injury treatment and surgery planning. CT and MRI provide complementary information and are both important for spine treatment and surgery planning. However, the acquisition of images of two different modalities interrupts the clinical workflow, adds to health care cost and poses challenges in registering the images for analysis. Translating MR images to CT images would result in seamless correlation between images and also save patients from exposure to ionizing radiation due to a CT examination. Using a large clinical dataset of 800 patients, we showed that a cycle consistent generative adversarial network (CycleGAN) can be trained with the unpaired, unaligned MR and CT images of the Lumbar Spine in generating realistic synthetic CT images. The trained model was evaluated using the paired MR and CT images of 8 patients, the average Mean Absolute Error (MAE) was found to be 184 HU with a standard deviation of 24 HU.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"385 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131125291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Kugelman, D. Alonso-Caneiro, Scott A. Read, Stephen J. Vincent, M. Collins
{"title":"OCT chorio-retinal segmentation with adversarial loss","authors":"J. Kugelman, D. Alonso-Caneiro, Scott A. Read, Stephen J. Vincent, M. Collins","doi":"10.1109/DICTA52665.2021.9647099","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647099","url":null,"abstract":"Deep learning methods provide state-of-the-art performance for the semantic segmentation of the retina and choroid in optical coherence tomography (OCT) images, enabling rapid, accurate and automatic analyses. However, high difficulty scans can still pose a problem even for the current state-of-the-art methods. Generative adversarial networks (GANs), are a family of deep learning methods that provide significant benefits for several applications due to their ability to learn complex data distributions, such as those of large image datasets. Segmentation is one of these applications that has been investigated in several modalities including retinal fundus image analysis, resulting in performance improvements when incorporating an adversarial loss for segmentation. However, the application of GAN-based segmentation to OCT images has not been investigated in detail and has not been studied at all in the context of choroidal segmentation. In this study, we investigate the use of a GAN to perform semantic segmentation of the retina and choroid in OCT images, by replacing the traditional segmentation loss with an adversarial loss. A detailed analysis of important training parameters and network architecture choices is provided to 1) better understand their behavior and 2) to optimize performance for chorio-retinal segmentation in OCT images. A key difference of this study is that, by considering the loss in isolation and comparing to traditional segmentation losses using an identical segmentation network, an unbiased and transparent comparison is performed. Using an optimized adversarial loss, strong performance is observed, providing near comparable performance to traditional segmentation losses. The results from this experiment provide a strong foundation for future work with GAN-based OCT retinal and choroidal segmentation.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132073923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image data augmentation for improving performance of deep learning-based model in pathological lung segmentation","authors":"Md. Shariful Alam, Dadong Wang, A. Sowmya","doi":"10.1109/DICTA52665.2021.9647209","DOIUrl":"https://doi.org/10.1109/DICTA52665.2021.9647209","url":null,"abstract":"Accurate segmentation of lung fields from chest X-ray (CXR) images is very important for subsequent analysis of many pulmonary diseases. Deep Neural Networks (DNN)-based methods have achieved remarkable progress in many image related tasks. However, their performance depends highly on the distribution of training and test samples, and they perform well if both training and test samples are from the same distribution. For example, DNN-based lung segmentation methods perform well on segmentation of healthy lung or lung with mild disease, however their performance is poor on lungs with severe abnormalities. Pulmonary opacification, which blurs the lung boundary, is one of the main reasons. A solution to this problem is data augmentation to increase the pool of training images, however despite the great success of traditional data augmentation techniques for natural images, they are not very effective for medical images. To simulate CXR images with opacification and low contrast, we present a novel image data augmentation technique in this study. To generate an augmented image, we first generate a random area inside the lung and then blur the area with a gaussian filter. Then, low contrast is simulated by adjusting the contrast and brightness. To evaluate the utility of the proposed augmentation technique, we applied it to images with different pulmonary diseases such as tuberculosis, pneumoconiosis and covid-19 from three public datasets as well as a private dataset and compared its effect on segmentation performance with traditional data augmentation techniques. Results suggest that the proposed technique outperforms traditional data augmentation techniques for all datasets on lung segmentation, in terms of Dice Coefficient (DC) and Jaccard Index (JI). Extensive experiments on multiple datasets validate the effectiveness of the proposed data augmentation technique.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131096906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}