L. Zhang, Chen Chen, Jinqian Tao, Zhaodun Huang, Hao Ding
{"title":"A foreground detection based video stabilization method and its application in aerospace measurement and control","authors":"L. Zhang, Chen Chen, Jinqian Tao, Zhaodun Huang, Hao Ding","doi":"10.1117/12.2644567","DOIUrl":"https://doi.org/10.1117/12.2644567","url":null,"abstract":"The output video of the optical equipment in the aerospace measurement and control field is prone to the problem of image quality degradation caused by the operator’s unstable manual operation. to improve the classical motion estimation based video stabilization algorithm, a novel video stabilization method based on foreground detection is proposed in this paper. Firstly, a object detection datasets based on historical images of the launch center is collected and labeled. Secondly, inspired by transfer learning and prior knowledge of the image in launch center, a YOLO-based object detection method for rocket launching scene is designed. Then, the object detection method is introduced into the motion estimation based video stabilization pipeline in which the object detection is used for foreground detection so the tracked feature points are filtered to reduce the global motion estimation error caused by the motion of the background area. Thus, the error stabilization problem in the classic motion estimation-based video stabilization method is avoided. Experiments show that the video stabilization method proposed in this paper achieved better image stabilization effect in subject and object evaluation. This paper has certain reference significance for exploring the application of deep learning and artificial intelligence technology in the field of aerospace measurement and control field.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134645578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tengyu Ma, Yunfei Liu, Weijian Huang, Chun Wang, Shuangquan Ge
{"title":"Hyperspectral remote sensing image semantic segmentation using extended extrema morphological profiles","authors":"Tengyu Ma, Yunfei Liu, Weijian Huang, Chun Wang, Shuangquan Ge","doi":"10.1117/12.2643022","DOIUrl":"https://doi.org/10.1117/12.2643022","url":null,"abstract":"Hyperspectral remote sensing images have been shown to be particularly beneficial for detecting the types of materials in a scene due to their unique spectral properties. This paper proposes a novel semantic segmentation method for hyperspectral image (HSI), which is based on a new spatial-spectral filtering, called extended extrema morphological profiles (EEMPs). Firstly, principal component analysis (PCA) is used as the feature extractor to construct the feature maps by extracting the first informative feature from the hyperspectral image (HSI). Secondly, the extrema morphological profiles (EMPs) are used to extract the spatial-spectral feature from the informative feature maps to construct the EEMPs. Finally, support vector machine (SVM) is utilized to obtain accurate semantic segmentation from the EEMPs. In order to evaluate the semantic segmentation results, the proposed method is tested on a widely used hyperspectral dataset, i.e., Houston dataset, and four metrics, i.e., class accuracy (CA), overall accuracy (OA), average accuracy (AA), and Kappa coefficient, are used to quantitatively measure the segmentation accuracy. The experimental results demonstrate that EEMPs can efficiently achieve good semantic segmentation accuracy.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133170697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Makeup transfer model based on BeautyGAN","authors":"Feng Zhang, Chunman Yan, Chen Qiu","doi":"10.1117/12.2644376","DOIUrl":"https://doi.org/10.1117/12.2644376","url":null,"abstract":"Facial makeup transfer can realize automatic application of any makeup styles on the target face without the change of face identity. BeautyGAN enables unsupervised makeup transfer, but there are several problems with generated images, that is, partial loss of makeup effect, poor performance in makeup transfer while the input images or backgrounds are complex, and difficulty in transferring low-resolution images directly. To solve these problems, BeautyGAN, an existing makeup transfer model, was optimized. Referring to the fast style transfer algorithm, a BeautyGAN-based makeup transfer model was designed and developed by introducing a perceptual loss model to improve the performance of BeautyGAN in extracting facial features. The input image is preprocessed by SRGAN network to adapt low-resolution images to BeautyGAN model. The results show that the optimized BeautyGAN has improved local migration performance and can be put into real time operation during testing. Compared with BeautyGAN, the effect of makeup transfer has been significantly improved on the input images with facial expressions, facial occlusion or small angle pose. It is also compatible with low-resolution images.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133076521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ling Xu, Guo Cao, Lin Deng, Lanwei Ding, Hao Xu, Qikun Pan
{"title":"Hyperspectral image classification based on dual-branch attention network with 3-D octave convolution","authors":"Ling Xu, Guo Cao, Lin Deng, Lanwei Ding, Hao Xu, Qikun Pan","doi":"10.1117/12.2644256","DOIUrl":"https://doi.org/10.1117/12.2644256","url":null,"abstract":"Hyperspectral Image (HSI) classification aims to assign each hyperspectral pixel with an appropriate land-cover category. In recent years, deep learning (DL) has received attention from a growing number of researchers. Hyperspectral image classification methods based on DL have shown admirable performance, but there is still room for improvement in terms of exploratory capabilities in spatial and spectral dimensions. To improve classification accuracy and reduce training samples, we propose a double branch attention network (OCDAN) based on 3-D octave convolution and dense block. Especially, we first use a 3-D octave convolution model and dense block to extract spatial features and spectral features respectively. Furthermore, a spatial attention module and a spectral attention module are implemented to highlight more discriminative information. Then the extracted features are fused for classification. Compared with the state-of-the-art methods, the proposed framework can achieve superior performance on two hyperspectral datasets, especially when the training samples are signally lacking. In addition, ablation experiments are utilized to validate the role of each part of the network.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"166 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133156087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SAR target image edge detection based on CNN","authors":"Wozhan Li, Xiaochuang Wu, Qiang Yang","doi":"10.1117/12.2643482","DOIUrl":"https://doi.org/10.1117/12.2643482","url":null,"abstract":"Aiming at the problems that the classical edge detection method is easily affected by noise and has low detection accuracy when applied to SAR target images, this paper studies the detection performance of the classical edge detection method Canny, CNN-based edge detection methods Holistically Nested Edge Detection (HED) and Richer Convolutional Features (RCF) when applied to SAR target images for the first time. The detection performance is evaluated using the MSTAR dataset, and the detection results of each method are compared based on the common evaluation indicators of image edge detection: F-measure, PR curve, and FPS. Canny's F-measure (ODS) is 0.611 and FPS is 43. The F-measure (ODS) of HED is 0.758 and the FPS is 18. The F-measure (ODS) of RCF is 0.729 and the FPS is 24. The F-measure (ODS) of RCF-MS is 0.753 and the FPS is 6. On the MSTAR dataset, the F-measure of HED is the best, which is 24.06% higher than Canny. RCF and RCF-MS also performed well, which were 19.31% and 23.24% higher than Canny respectively. The edge detection method based on CNN has higher F-measure, is less affected by noise, and has less loss of edge details. When applied to SAR images affected by speckle noise, the performance is much better than Canny, but there is still a shortage of slightly worse computing speed.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128609875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spectrogram-based speech enhancement by spatial attention generative adversarial networks","authors":"Haixin Luo, Shengyu Lu, Qian Wei, Yu Fu, Jindong Tian","doi":"10.1117/12.2644385","DOIUrl":"https://doi.org/10.1117/12.2644385","url":null,"abstract":"The spectrogram can clearly show the composition of different frequencies in the speech signal. In this paper, a speech enhancement method based on deep learning image processing is proposed, which optimizes the spectrogram of the laser detected speech signal to achieve speech enhancement. The laser beam emitted by the laser Doppler vibrometer (LDV) is focused on the glass window to detect the vibration caused by sound wave. After conversion, the audio information that causes vibration is obtained. Under the interference of speckle noise and air disturbance, the detected speech signal not only has a low signal-to-noise ratio (SNR) but also has non-stationary noise. In order to overcome the difficulty that traditional methods are difficult to extract weak signals in the case of severe noise interference, we use deep learning to achieve spectrogram noise reduction and speech information enhancement. By processing the spectrogram of noisy speech with the generative adversarial networks (GAN) combined with the spatial attention mechanism and introducing the short-time objective intelligibility (STOI) into the loss function, the laser detected speech signal was successfully enhanced.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116809218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yinan Wang, Rongchuan Cao, Tianqi Zhang, Kun Yan, Xiaoli Zhang
{"title":"Segmentation based lidar odometry and mapping","authors":"Yinan Wang, Rongchuan Cao, Tianqi Zhang, Kun Yan, Xiaoli Zhang","doi":"10.1117/12.2644264","DOIUrl":"https://doi.org/10.1117/12.2644264","url":null,"abstract":"LiDAR based Simultaneous Localization and Mapping (LiDAR SLAM) plays a vital role in autonomous driving and has attracted the attention of researchers. In order to achieve higher accuracy of motion estimation between adjacent LiDAR frames and reconstruction of the map, a segmentation-based LiDAR odometry and mapping framework is proposed in this paper. In detail, we first define the classification of several features with weak semantic information, the extraction method of which is achieved by a segmentation algorithm proposed in this paper that is based on greedy search. Based on the above work, a novel point cloud registration algorithm is also proposed in this paper, which is solved by modeling the problem as a nonlinear optimization problem. In order to verify the effectiveness of the proposed model, we collect a large amount of data in the autonomous driving test area to test it and compare the results with the existing state-of-the-art models. The experimental results show that the algorithm proposed in this paper can run stably in real-world autonomous driving scenarios and has smaller error and higher robustness compared with other models.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115385611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RHDDNet: multi-label classification-based detection of image hybrid distortions","authors":"Bowen Dou, Hai Helen Li, Shujuan Hou","doi":"10.1117/12.2643514","DOIUrl":"https://doi.org/10.1117/12.2643514","url":null,"abstract":"Image distortion detection is a key step in image quality assessment and image reconstruction algorithms. In previous work, a large number of research focus on detecting the single distortion in the image. However, the number of distortion types in the image is often uncertain. Thus, we propose a model that can be used for hybrid distortion detection. Concretely, we transform the hybrid distortion detection task into a multi-label classification task and abstract it as a convolutional network optimization problem. A dataset is created to train the model and evaluate its performance. Experiments show that the proposed model performs well in the detection of hybrid distortions in images.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115424077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Concepts encoding via knowledge-guided self-attention networks","authors":"Kunnan Geng, Xin Li, Wenyao Zhang","doi":"10.1117/12.2644388","DOIUrl":"https://doi.org/10.1117/12.2644388","url":null,"abstract":"With the growth of digital data created by us, a large number of deep learning models have been proposed for data mining. Representation learning offers an exciting avenue to address data mining demands by embedding data into feature space. In the healthcare field, most existing methods are proposed to mine electronic health records (EHR) data by learning medical concept representations. Despite the vigorous development of this field, we find the contextual information of medical concepts has always been overlooked, which is important to represent these concepts. Given these limitations, we design a novel medical concept representation method, which is equipped with a self-attention mechanism to learn contextual representation from EHR data and prior knowledge. Extensive experiments on medication recommendation tasks verify the designed modules are consistently beneficial to model performance.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115748424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Coordinate transformation and three-dimensional backprojection algorithm for multiple receiver synthetic aperture sonar motion compensation","authors":"Zhen Tian, Heping Zhong, Jinsong Tang","doi":"10.1117/12.2643254","DOIUrl":"https://doi.org/10.1117/12.2643254","url":null,"abstract":"Motion compensation (MoCo) is an important step for obtaining a high-quality image of synthetic aperture sonar (SAS). In this paper, a novel three-dimensional (3D) backprojection (BP) algorithm is proposed to solve the MoCo question for the multiple receiver SAS with the six degrees of freedom (DOF) motion error. In order to improve the MoCo capacity of the proposed 3D BP algorithm, some more accurate position data of sonar array are calculated by the method of space rectangular coordinate transformation. According to the inherent relationship between sonar array and inertial navigation system (INS), the position data of sonar array at the sampling time of INS data are obtained based on the position data and attitude data output by INS, without any approximation. On the basis, the relatively accurate position data of sonar array at the time of pulse transmission are obtained by the method of linear interpolation. Considering the movement of SAS during the period of signal propagation, the signal propagation time for each pulse and each receiver are calculated. Moreover, the position data of each receiver of SAS at the time of signal reception are obtained. Based on above derived position data, a well-focused SAS image is obtained and the six DOF motion error are compensated simultaneously by the 3D BP algorithm. The result of experiment demonstrates the validity of the proposed algorithm.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121901756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}