{"title":"A joint feature aggregation method for robust masked face recognition","authors":"Xinmeng Xu, Yuesheng Zhu, Zhiqiang Bai","doi":"10.1117/12.2643615","DOIUrl":"https://doi.org/10.1117/12.2643615","url":null,"abstract":"Masked face recognition becomes an important issue of prevention and monitor in outbreak of COVID-19. Due to loss of facial features caused by masks, unmasked face recognition could not identify the specific person well. Current masked faces methods focus on local features from the unmasked regions or recover masked faces to fit standard face recognition models. These methods only focus on partial information of faces thus these features are not robust enough to deal with complex situations. To solve this problem, we propose a joint feature aggregation method for robust masked face recognition. Firstly, we design a multi-module feature extraction network to extract different features, including local module (LM), global module (GM), and recovery module (RM). Our method not only extracts global features from the original masked faces but also extracts local features from the unmasked area since it is a discriminative part of masked faces. Specially, we utilize a pretrained recovery model to recover masked faces and get some recovery features from the recovered faces. Finally, features from three modules are aggregated as a joint feature of masked faces. The joint feature enhances the feature representation of masked faces thus it is more discriminative and robust than that in previous methods. Experiments show that our method can achieve better performance than previous methods on LFW dataset.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125393058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatio-temporal dual-attention network for view-invariant human action recognition","authors":"Kumie Gedamu, Getinet Yilma, Maregu Assefa, Melese Ayalew","doi":"10.1117/12.2643446","DOIUrl":"https://doi.org/10.1117/12.2643446","url":null,"abstract":"Due to the action occlusion and information loss caused by the view changes, view-invariant human action recognition is challenging in plenty of real-world applications. One possible solution to this problem is minimizing representation discrepancy in different views while learning discriminative feature representation for view-invariant action recognition. To solve the problem, we propose a Spatio-temporal Dual-Attention Network (SDA-Net) for view-invariant human action recognition. The SDA-Net is composed of a spatial/temporal self-attention and spatial/temporal cross-attention modules. The spatial/temporal self-attention module captures global long-range dependencies of action features. The cross-attention module is designed to learn view-invariant co-occurrence attention maps and generates discriminative features for a semantic representation of actions in different views. We exhaustively evaluate our approach on the NTU- 60, NTU-120, and UESTC datasets with multi-type evaluations, i.e., Cross-Subject, Cross-View, Cross-Set, and Arbitrary-view. Extensive experiment results demonstrate that our approach exceeds the state-of-the-art approaches with a significant margin in view-invariant human action recognition.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"233 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121413001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ship detection in optical remote sensing images based on saliency and rotation-invariant feature","authors":"Donglai Wu, Bingxin Liu, Wanhan Zhang","doi":"10.1117/12.2644322","DOIUrl":"https://doi.org/10.1117/12.2644322","url":null,"abstract":"Ship detection is important to guarantee maritime safety at sea. In optical remote sensing images, the detection efficiency and accuracy are limited due to the complex ocean background and variant ship directions. Therefore, we propose a novel ship detection method, which consists of two main stages: candidate area location and target discrimination. In the first stage, we use the spectral residual method to detect the saliency map of the original image, get the saliency sub-map containing the ship target, and then use the threshold segmentation method to obtain the ship candidate region. In the second stage, we obtain the radial gradient histogram of the ship candidate region and transform it into a radial gradient feature, which is rotation-invariant. Afterward, radial gradient features and LBP features are fused, and SVM is used for ship detection. Data experimental results show that the method has the characteristics of low complexity and high detection accuracy.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"321 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114015131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Ou, Danni Ai, X. Hu, Zhao Zheng, Yu Qiu, Jian Yang
{"title":"Real-time image distortion correction based on FPGA","authors":"R. Ou, Danni Ai, X. Hu, Zhao Zheng, Yu Qiu, Jian Yang","doi":"10.1117/12.2644433","DOIUrl":"https://doi.org/10.1117/12.2644433","url":null,"abstract":"As the primary method for real-time image processing, a field-programmable gate array (FPGA) is widely used in binocular vision systems. Distortion correction is an important component of binocular stereo vision systems. When implementing a real-time image distortion correction algorithm on FPGA, problems, such as insufficient on-chip storage space and high complexity of coordinate correction calculation methods, occur. These problems are analyzed in detail in this study. On the basis of the reverse mapping method, a distortion correction algorithm that uses a lookup table (LUT) is proposed. A compression with restoration method is established for this LUT to reduce space occupation. The corresponding cache method of LUT and the image data are designed. The algorithm is verified on our binocular stereo vision system based on Xilinx Zynq-7020. The experiments show that the proposed algorithm can achieve real-time and high precision gray image distortion correction effect and significantly reduce the consumption of on-chip resources. Enough to meet the requirements of accurate binocular stereo vision system.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127783528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SlowFast with DropBlock and smooth samples loss for student action recognition","authors":"Chuanming Li, Wenxing Bao, Xu Chen, Yongjun Jing, Xiudong Qu","doi":"10.1117/12.2644370","DOIUrl":"https://doi.org/10.1117/12.2644370","url":null,"abstract":"Due to the advent of large-scale video datasets, action recognition using three-dimensional convolutions (3D CNNs) containing spatiotemporal information has become mainstream. Aiming at the problem of classroom student behavior recognition, the paper adopts the improved SlowFast network structure to deal with spatial structure and temporal events respectively. First, DropBlock (a regularization method) is added to the SlowFast network to solve the overfitting problem. Second, for the problem of Long-Tailed Distribution, the designed Smooth Sample (SS) Loss function is added to the network to smooth the number of samples. Classification experiments show that compared with similar methods, the model accuracy of our method on the Kinetics and Student Action Dataset is increased by 2.1% and 2.9%, respectively.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132665226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accurate neuroanatomy segmentation using 3D spatial and anatomical attention neural networks","authors":"Hewei Cheng, Zhengyu Ren, Peiyang Li, Yin Tian, Wei Wang, Zhangyong Li, Yongjiao Fan","doi":"10.1117/12.2644416","DOIUrl":"https://doi.org/10.1117/12.2644416","url":null,"abstract":"Brain structure segmentation from 3D magnetic resonance (MR) images is a prerequisite for quantifying brain morphology. Since typical 3D whole brain deep learning models demand large GPU memory, 3D image patch-based deep learning methods are favored for their GPU memory efficiency. However, existing 3D image patch-based methods are not well equipped to capture spatial and anatomical contextual information that is necessary for accurate brain structure segmentation. To overcome this limitation, we develop a spatial and anatomical context-aware network to integrate spatial and anatomical contextual information for accurate brain structure segmentation from MR images. Particularly, a spatial attention block is adopted to encode spatial context information of the 3D patches, an anatomical attention block is adopted to aggregate image information across channels of the 3D patches, and finally the spatial and anatomical attention blocks are adaptively fused by an element-wise convolution operation. Moreover, an online patch sampling strategy is utilized to train a deep neural network with all available patches of the training MR images, facilitating accurate segmentation of brain structures. Ablation and comparison results have demonstrated that our method is capable of achieving promising segmentation performance, better than state-of-the-art alternative methods by 3.30% in terms of Dice scores.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134373062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Zhang, Chen Chen, Jinqian Tao, Zhaodun Huang, Hao Ding
{"title":"A foreground detection based video stabilization method and its application in aerospace measurement and control","authors":"L. Zhang, Chen Chen, Jinqian Tao, Zhaodun Huang, Hao Ding","doi":"10.1117/12.2644567","DOIUrl":"https://doi.org/10.1117/12.2644567","url":null,"abstract":"The output video of the optical equipment in the aerospace measurement and control field is prone to the problem of image quality degradation caused by the operator’s unstable manual operation. to improve the classical motion estimation based video stabilization algorithm, a novel video stabilization method based on foreground detection is proposed in this paper. Firstly, a object detection datasets based on historical images of the launch center is collected and labeled. Secondly, inspired by transfer learning and prior knowledge of the image in launch center, a YOLO-based object detection method for rocket launching scene is designed. Then, the object detection method is introduced into the motion estimation based video stabilization pipeline in which the object detection is used for foreground detection so the tracked feature points are filtered to reduce the global motion estimation error caused by the motion of the background area. Thus, the error stabilization problem in the classic motion estimation-based video stabilization method is avoided. Experiments show that the video stabilization method proposed in this paper achieved better image stabilization effect in subject and object evaluation. This paper has certain reference significance for exploring the application of deep learning and artificial intelligence technology in the field of aerospace measurement and control field.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134645578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ye Cai, Lan Luo, Hongxia Gao, Shicheng Niu, Weipeng Yang, Tian Qi, Guoheng Liang
{"title":"Haze removal using a hybrid convolutional sparse representation model","authors":"Ye Cai, Lan Luo, Hongxia Gao, Shicheng Niu, Weipeng Yang, Tian Qi, Guoheng Liang","doi":"10.1117/12.2643362","DOIUrl":"https://doi.org/10.1117/12.2643362","url":null,"abstract":"Haze removal is a challenging task in image recovery, because hazy images are always degraded by turbid media in atmosphere, showing limited visibility and low contrast. Analysis Sparse Representation (ASR) and Synthesis Sparse Representation (SSR) has been widely used to recover degraded images. But there are always unexpected noise and details loss in the recovered images, as they take relatively less account of the images’ inherent coherence between image patches. Thus, in this paper, we propose a new haze removal method based on hybrid convolutional sparse representation, with consideration of the adjacent relationship by convolution and superposition. To integrate optical model into a convolutional sparse framework, we separate transmission map by transforming it into logarithm domain. And then a structure-based constraint on transmission map is proposed to maintain piece-wise smoothness and reduce the influence brought by pseudo depth abrupt edges. Experiment results demonstrate that the proposed method can restore fine structure of hazy images and suppress boosted noise.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133529156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tengyu Ma, Yunfei Liu, Weijian Huang, Chun Wang, Shuangquan Ge
{"title":"Hyperspectral remote sensing image semantic segmentation using extended extrema morphological profiles","authors":"Tengyu Ma, Yunfei Liu, Weijian Huang, Chun Wang, Shuangquan Ge","doi":"10.1117/12.2643022","DOIUrl":"https://doi.org/10.1117/12.2643022","url":null,"abstract":"Hyperspectral remote sensing images have been shown to be particularly beneficial for detecting the types of materials in a scene due to their unique spectral properties. This paper proposes a novel semantic segmentation method for hyperspectral image (HSI), which is based on a new spatial-spectral filtering, called extended extrema morphological profiles (EEMPs). Firstly, principal component analysis (PCA) is used as the feature extractor to construct the feature maps by extracting the first informative feature from the hyperspectral image (HSI). Secondly, the extrema morphological profiles (EMPs) are used to extract the spatial-spectral feature from the informative feature maps to construct the EEMPs. Finally, support vector machine (SVM) is utilized to obtain accurate semantic segmentation from the EEMPs. In order to evaluate the semantic segmentation results, the proposed method is tested on a widely used hyperspectral dataset, i.e., Houston dataset, and four metrics, i.e., class accuracy (CA), overall accuracy (OA), average accuracy (AA), and Kappa coefficient, are used to quantitatively measure the segmentation accuracy. The experimental results demonstrate that EEMPs can efficiently achieve good semantic segmentation accuracy.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133170697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ling Xu, Guo Cao, Lin Deng, Lanwei Ding, Hao Xu, Qikun Pan
{"title":"Hyperspectral image classification based on dual-branch attention network with 3-D octave convolution","authors":"Ling Xu, Guo Cao, Lin Deng, Lanwei Ding, Hao Xu, Qikun Pan","doi":"10.1117/12.2644256","DOIUrl":"https://doi.org/10.1117/12.2644256","url":null,"abstract":"Hyperspectral Image (HSI) classification aims to assign each hyperspectral pixel with an appropriate land-cover category. In recent years, deep learning (DL) has received attention from a growing number of researchers. Hyperspectral image classification methods based on DL have shown admirable performance, but there is still room for improvement in terms of exploratory capabilities in spatial and spectral dimensions. To improve classification accuracy and reduce training samples, we propose a double branch attention network (OCDAN) based on 3-D octave convolution and dense block. Especially, we first use a 3-D octave convolution model and dense block to extract spatial features and spectral features respectively. Furthermore, a spatial attention module and a spectral attention module are implemented to highlight more discriminative information. Then the extracted features are fused for classification. Compared with the state-of-the-art methods, the proposed framework can achieve superior performance on two hyperspectral datasets, especially when the training samples are signally lacking. In addition, ablation experiments are utilized to validate the role of each part of the network.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"166 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133156087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}