{"title":"ARO-DeepSFM: deep structure-from-motion with alternating recursive optimization","authors":"Rongcheng Cui, Haoyuan Huang","doi":"10.1117/12.2644363","DOIUrl":"https://doi.org/10.1117/12.2644363","url":null,"abstract":"Structure from Motion (SfM) is the cornerstone of 3D reconstruction and visualization of SLAM. Existing deep learning approaches formulate problems by restoring absolute pose ratios from two consecutive frames or predicting a depth map from a single image, both of which are unsuitable problems. In order to solve this maladaptation problem and further tap the potential of neural networks in SfM, this paper proposes a new optimization model for deep motion structure recovery based on recurrent neural networks. The model consists of two architectures based on depth and posture estimation of costs, and is constantly iteratively updated alternately to improve both systems. The neural optimizer designed here tracks historical information during iterations to minimize feature metric cost update depth and camera poses. Experiments show that the optimization model of deep motion structure recovery in this paper is superior to the previous method, effectively reducing the cost of feature-metric, while refining depth and poses.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125872692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FSC-UNet: a lightweight medical image segmentation algorithm fused with skip connections","authors":"Yixin Chen, Jianjun Zhang, Xulin Zong, Zhipeng Zhao, Hanqing Liu, Ruichun Tang, Peishun Liu, Jinyu Wang","doi":"10.1117/12.2644360","DOIUrl":"https://doi.org/10.1117/12.2644360","url":null,"abstract":"In order to study the effect of skip connections to segmentation performance in encoder and decoder networks, in this paper, we improve the skip connections of U-Net model and adopt the method of sub-module fusion connection. We fuse the high and low layers of the encoder by multi-head attention. Fusion is performed separately, and the fusion result is connected to the decoder. Considering that different input images have different effects to model training due to factors such as noise, we set the threshold by calculating the Euclidean distance between the image and the mask during training, so that different images use different skip connection methods. Experiments on Cell nuclei, Synapse, Heart, Chaos datasets show that FSC-UNet algorithm this paper proposed has better results than existing algorithms.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126672535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Monocular inertial indoor location algorithm considering point and line features","authors":"Ju Huo, Liang Wei, Chuwei Mao","doi":"10.1117/12.2644842","DOIUrl":"https://doi.org/10.1117/12.2644842","url":null,"abstract":"Compared with point features, line features in the environment have more structural information. When indoor texture is not rich, making full use of the structural information of line features can improve the robustness and accuracy of simultaneous location and mapping algorithm. In this paper, we propose an improved monocular inertial indoor location algorithm considering point and line features. Firstly, the point features and line features in the environment are extracted, matched and parameterized, and then the inertial sensor is used to estimate the initial pose, and the tightly coupled method is adopted to optimize the observation error of the point and line features and the measurement error of the inertial sensor simultaneously in the back optimization to achieve accurate estimation of the pose of unmanned aerial vehicle. Finally, loop closure detection and pose graph optimization are used to optimize the pose in real time. The test results on public datasets show that the location accuracy of the proposed method is superior to 10 cm under sufficient light and texture conditions. The angle measurement accuracy is better than 0.05 rad, and the output frequency of positioning results is 10Hz, which effectively improves the accuracy of traditional visual inertial location method and meets the requirements of real-time.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123054438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Makeup transfer model based on BeautyGAN","authors":"Feng Zhang, Chunman Yan, Chen Qiu","doi":"10.1117/12.2644376","DOIUrl":"https://doi.org/10.1117/12.2644376","url":null,"abstract":"Facial makeup transfer can realize automatic application of any makeup styles on the target face without the change of face identity. BeautyGAN enables unsupervised makeup transfer, but there are several problems with generated images, that is, partial loss of makeup effect, poor performance in makeup transfer while the input images or backgrounds are complex, and difficulty in transferring low-resolution images directly. To solve these problems, BeautyGAN, an existing makeup transfer model, was optimized. Referring to the fast style transfer algorithm, a BeautyGAN-based makeup transfer model was designed and developed by introducing a perceptual loss model to improve the performance of BeautyGAN in extracting facial features. The input image is preprocessed by SRGAN network to adapt low-resolution images to BeautyGAN model. The results show that the optimized BeautyGAN has improved local migration performance and can be put into real time operation during testing. Compared with BeautyGAN, the effect of makeup transfer has been significantly improved on the input images with facial expressions, facial occlusion or small angle pose. It is also compatible with low-resolution images.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133076521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SAR target image edge detection based on CNN","authors":"Wozhan Li, Xiaochuang Wu, Qiang Yang","doi":"10.1117/12.2643482","DOIUrl":"https://doi.org/10.1117/12.2643482","url":null,"abstract":"Aiming at the problems that the classical edge detection method is easily affected by noise and has low detection accuracy when applied to SAR target images, this paper studies the detection performance of the classical edge detection method Canny, CNN-based edge detection methods Holistically Nested Edge Detection (HED) and Richer Convolutional Features (RCF) when applied to SAR target images for the first time. The detection performance is evaluated using the MSTAR dataset, and the detection results of each method are compared based on the common evaluation indicators of image edge detection: F-measure, PR curve, and FPS. Canny's F-measure (ODS) is 0.611 and FPS is 43. The F-measure (ODS) of HED is 0.758 and the FPS is 18. The F-measure (ODS) of RCF is 0.729 and the FPS is 24. The F-measure (ODS) of RCF-MS is 0.753 and the FPS is 6. On the MSTAR dataset, the F-measure of HED is the best, which is 24.06% higher than Canny. RCF and RCF-MS also performed well, which were 19.31% and 23.24% higher than Canny respectively. The edge detection method based on CNN has higher F-measure, is less affected by noise, and has less loss of edge details. When applied to SAR images affected by speckle noise, the performance is much better than Canny, but there is still a shortage of slightly worse computing speed.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128609875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spectrogram-based speech enhancement by spatial attention generative adversarial networks","authors":"Haixin Luo, Shengyu Lu, Qian Wei, Yu Fu, Jindong Tian","doi":"10.1117/12.2644385","DOIUrl":"https://doi.org/10.1117/12.2644385","url":null,"abstract":"The spectrogram can clearly show the composition of different frequencies in the speech signal. In this paper, a speech enhancement method based on deep learning image processing is proposed, which optimizes the spectrogram of the laser detected speech signal to achieve speech enhancement. The laser beam emitted by the laser Doppler vibrometer (LDV) is focused on the glass window to detect the vibration caused by sound wave. After conversion, the audio information that causes vibration is obtained. Under the interference of speckle noise and air disturbance, the detected speech signal not only has a low signal-to-noise ratio (SNR) but also has non-stationary noise. In order to overcome the difficulty that traditional methods are difficult to extract weak signals in the case of severe noise interference, we use deep learning to achieve spectrogram noise reduction and speech information enhancement. By processing the spectrogram of noisy speech with the generative adversarial networks (GAN) combined with the spatial attention mechanism and introducing the short-time objective intelligibility (STOI) into the loss function, the laser detected speech signal was successfully enhanced.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116809218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yinan Wang, Rongchuan Cao, Tianqi Zhang, Kun Yan, Xiaoli Zhang
{"title":"Segmentation based lidar odometry and mapping","authors":"Yinan Wang, Rongchuan Cao, Tianqi Zhang, Kun Yan, Xiaoli Zhang","doi":"10.1117/12.2644264","DOIUrl":"https://doi.org/10.1117/12.2644264","url":null,"abstract":"LiDAR based Simultaneous Localization and Mapping (LiDAR SLAM) plays a vital role in autonomous driving and has attracted the attention of researchers. In order to achieve higher accuracy of motion estimation between adjacent LiDAR frames and reconstruction of the map, a segmentation-based LiDAR odometry and mapping framework is proposed in this paper. In detail, we first define the classification of several features with weak semantic information, the extraction method of which is achieved by a segmentation algorithm proposed in this paper that is based on greedy search. Based on the above work, a novel point cloud registration algorithm is also proposed in this paper, which is solved by modeling the problem as a nonlinear optimization problem. In order to verify the effectiveness of the proposed model, we collect a large amount of data in the autonomous driving test area to test it and compare the results with the existing state-of-the-art models. The experimental results show that the algorithm proposed in this paper can run stably in real-world autonomous driving scenarios and has smaller error and higher robustness compared with other models.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115385611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RHDDNet: multi-label classification-based detection of image hybrid distortions","authors":"Bowen Dou, Hai Helen Li, Shujuan Hou","doi":"10.1117/12.2643514","DOIUrl":"https://doi.org/10.1117/12.2643514","url":null,"abstract":"Image distortion detection is a key step in image quality assessment and image reconstruction algorithms. In previous work, a large number of research focus on detecting the single distortion in the image. However, the number of distortion types in the image is often uncertain. Thus, we propose a model that can be used for hybrid distortion detection. Concretely, we transform the hybrid distortion detection task into a multi-label classification task and abstract it as a convolutional network optimization problem. A dataset is created to train the model and evaluate its performance. Experiments show that the proposed model performs well in the detection of hybrid distortions in images.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115424077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Concepts encoding via knowledge-guided self-attention networks","authors":"Kunnan Geng, Xin Li, Wenyao Zhang","doi":"10.1117/12.2644388","DOIUrl":"https://doi.org/10.1117/12.2644388","url":null,"abstract":"With the growth of digital data created by us, a large number of deep learning models have been proposed for data mining. Representation learning offers an exciting avenue to address data mining demands by embedding data into feature space. In the healthcare field, most existing methods are proposed to mine electronic health records (EHR) data by learning medical concept representations. Despite the vigorous development of this field, we find the contextual information of medical concepts has always been overlooked, which is important to represent these concepts. Given these limitations, we design a novel medical concept representation method, which is equipped with a self-attention mechanism to learn contextual representation from EHR data and prior knowledge. Extensive experiments on medication recommendation tasks verify the designed modules are consistently beneficial to model performance.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115748424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Coordinate transformation and three-dimensional backprojection algorithm for multiple receiver synthetic aperture sonar motion compensation","authors":"Zhen Tian, Heping Zhong, Jinsong Tang","doi":"10.1117/12.2643254","DOIUrl":"https://doi.org/10.1117/12.2643254","url":null,"abstract":"Motion compensation (MoCo) is an important step for obtaining a high-quality image of synthetic aperture sonar (SAS). In this paper, a novel three-dimensional (3D) backprojection (BP) algorithm is proposed to solve the MoCo question for the multiple receiver SAS with the six degrees of freedom (DOF) motion error. In order to improve the MoCo capacity of the proposed 3D BP algorithm, some more accurate position data of sonar array are calculated by the method of space rectangular coordinate transformation. According to the inherent relationship between sonar array and inertial navigation system (INS), the position data of sonar array at the sampling time of INS data are obtained based on the position data and attitude data output by INS, without any approximation. On the basis, the relatively accurate position data of sonar array at the time of pulse transmission are obtained by the method of linear interpolation. Considering the movement of SAS during the period of signal propagation, the signal propagation time for each pulse and each receiver are calculated. Moreover, the position data of each receiver of SAS at the time of signal reception are obtained. Based on above derived position data, a well-focused SAS image is obtained and the six DOF motion error are compensated simultaneously by the 3D BP algorithm. The result of experiment demonstrates the validity of the proposed algorithm.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121901756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}