{"title":"A joint feature aggregation method for robust masked face recognition","authors":"Xinmeng Xu, Yuesheng Zhu, Zhiqiang Bai","doi":"10.1117/12.2643615","DOIUrl":"https://doi.org/10.1117/12.2643615","url":null,"abstract":"Masked face recognition becomes an important issue of prevention and monitor in outbreak of COVID-19. Due to loss of facial features caused by masks, unmasked face recognition could not identify the specific person well. Current masked faces methods focus on local features from the unmasked regions or recover masked faces to fit standard face recognition models. These methods only focus on partial information of faces thus these features are not robust enough to deal with complex situations. To solve this problem, we propose a joint feature aggregation method for robust masked face recognition. Firstly, we design a multi-module feature extraction network to extract different features, including local module (LM), global module (GM), and recovery module (RM). Our method not only extracts global features from the original masked faces but also extracts local features from the unmasked area since it is a discriminative part of masked faces. Specially, we utilize a pretrained recovery model to recover masked faces and get some recovery features from the recovered faces. Finally, features from three modules are aggregated as a joint feature of masked faces. The joint feature enhances the feature representation of masked faces thus it is more discriminative and robust than that in previous methods. Experiments show that our method can achieve better performance than previous methods on LFW dataset.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125393058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatio-temporal dual-attention network for view-invariant human action recognition","authors":"Kumie Gedamu, Getinet Yilma, Maregu Assefa, Melese Ayalew","doi":"10.1117/12.2643446","DOIUrl":"https://doi.org/10.1117/12.2643446","url":null,"abstract":"Due to the action occlusion and information loss caused by the view changes, view-invariant human action recognition is challenging in plenty of real-world applications. One possible solution to this problem is minimizing representation discrepancy in different views while learning discriminative feature representation for view-invariant action recognition. To solve the problem, we propose a Spatio-temporal Dual-Attention Network (SDA-Net) for view-invariant human action recognition. The SDA-Net is composed of a spatial/temporal self-attention and spatial/temporal cross-attention modules. The spatial/temporal self-attention module captures global long-range dependencies of action features. The cross-attention module is designed to learn view-invariant co-occurrence attention maps and generates discriminative features for a semantic representation of actions in different views. We exhaustively evaluate our approach on the NTU- 60, NTU-120, and UESTC datasets with multi-type evaluations, i.e., Cross-Subject, Cross-View, Cross-Set, and Arbitrary-view. Extensive experiment results demonstrate that our approach exceeds the state-of-the-art approaches with a significant margin in view-invariant human action recognition.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"233 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121413001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ship detection in optical remote sensing images based on saliency and rotation-invariant feature","authors":"Donglai Wu, Bingxin Liu, Wanhan Zhang","doi":"10.1117/12.2644322","DOIUrl":"https://doi.org/10.1117/12.2644322","url":null,"abstract":"Ship detection is important to guarantee maritime safety at sea. In optical remote sensing images, the detection efficiency and accuracy are limited due to the complex ocean background and variant ship directions. Therefore, we propose a novel ship detection method, which consists of two main stages: candidate area location and target discrimination. In the first stage, we use the spectral residual method to detect the saliency map of the original image, get the saliency sub-map containing the ship target, and then use the threshold segmentation method to obtain the ship candidate region. In the second stage, we obtain the radial gradient histogram of the ship candidate region and transform it into a radial gradient feature, which is rotation-invariant. Afterward, radial gradient features and LBP features are fused, and SVM is used for ship detection. Data experimental results show that the method has the characteristics of low complexity and high detection accuracy.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"321 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114015131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Ou, Danni Ai, X. Hu, Zhao Zheng, Yu Qiu, Jian Yang
{"title":"Real-time image distortion correction based on FPGA","authors":"R. Ou, Danni Ai, X. Hu, Zhao Zheng, Yu Qiu, Jian Yang","doi":"10.1117/12.2644433","DOIUrl":"https://doi.org/10.1117/12.2644433","url":null,"abstract":"As the primary method for real-time image processing, a field-programmable gate array (FPGA) is widely used in binocular vision systems. Distortion correction is an important component of binocular stereo vision systems. When implementing a real-time image distortion correction algorithm on FPGA, problems, such as insufficient on-chip storage space and high complexity of coordinate correction calculation methods, occur. These problems are analyzed in detail in this study. On the basis of the reverse mapping method, a distortion correction algorithm that uses a lookup table (LUT) is proposed. A compression with restoration method is established for this LUT to reduce space occupation. The corresponding cache method of LUT and the image data are designed. The algorithm is verified on our binocular stereo vision system based on Xilinx Zynq-7020. The experiments show that the proposed algorithm can achieve real-time and high precision gray image distortion correction effect and significantly reduce the consumption of on-chip resources. Enough to meet the requirements of accurate binocular stereo vision system.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127783528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SlowFast with DropBlock and smooth samples loss for student action recognition","authors":"Chuanming Li, Wenxing Bao, Xu Chen, Yongjun Jing, Xiudong Qu","doi":"10.1117/12.2644370","DOIUrl":"https://doi.org/10.1117/12.2644370","url":null,"abstract":"Due to the advent of large-scale video datasets, action recognition using three-dimensional convolutions (3D CNNs) containing spatiotemporal information has become mainstream. Aiming at the problem of classroom student behavior recognition, the paper adopts the improved SlowFast network structure to deal with spatial structure and temporal events respectively. First, DropBlock (a regularization method) is added to the SlowFast network to solve the overfitting problem. Second, for the problem of Long-Tailed Distribution, the designed Smooth Sample (SS) Loss function is added to the network to smooth the number of samples. Classification experiments show that compared with similar methods, the model accuracy of our method on the Kinetics and Student Action Dataset is increased by 2.1% and 2.9%, respectively.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132665226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accurate neuroanatomy segmentation using 3D spatial and anatomical attention neural networks","authors":"Hewei Cheng, Zhengyu Ren, Peiyang Li, Yin Tian, Wei Wang, Zhangyong Li, Yongjiao Fan","doi":"10.1117/12.2644416","DOIUrl":"https://doi.org/10.1117/12.2644416","url":null,"abstract":"Brain structure segmentation from 3D magnetic resonance (MR) images is a prerequisite for quantifying brain morphology. Since typical 3D whole brain deep learning models demand large GPU memory, 3D image patch-based deep learning methods are favored for their GPU memory efficiency. However, existing 3D image patch-based methods are not well equipped to capture spatial and anatomical contextual information that is necessary for accurate brain structure segmentation. To overcome this limitation, we develop a spatial and anatomical context-aware network to integrate spatial and anatomical contextual information for accurate brain structure segmentation from MR images. Particularly, a spatial attention block is adopted to encode spatial context information of the 3D patches, an anatomical attention block is adopted to aggregate image information across channels of the 3D patches, and finally the spatial and anatomical attention blocks are adaptively fused by an element-wise convolution operation. Moreover, an online patch sampling strategy is utilized to train a deep neural network with all available patches of the training MR images, facilitating accurate segmentation of brain structures. Ablation and comparison results have demonstrated that our method is capable of achieving promising segmentation performance, better than state-of-the-art alternative methods by 3.30% in terms of Dice scores.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134373062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Face tampering detection based on spatiotemporal attention residual network","authors":"Z. Cai, Weimin Wei, Fanxing Meng, Changan Liu","doi":"10.1117/12.2644654","DOIUrl":"https://doi.org/10.1117/12.2644654","url":null,"abstract":"Fake technology has evolved to the point where fake faces are increasingly difficult to distinguish from real ones. If the forged face videos spread wildly on social media, social unrest or personal reputation damage may lead to social unrest. A face tampering detection method (RALNet) with spatiotemporal attention residual network is designed to reduce the misuse of face data due to malicious dissemination. Firstly, we propose a process to extract video face data, which reduces the interference of irrelevant information and improves the utilization of data processing. Then, based on the characteristics of incoherence and inconsistency in spatial and temporal information of tampered videos, the spatial domain features and temporal domain features of the target face video are extracted by introducing an attention mechanism of residual network and long short-term memory network to classify the targets as true or fake. The experimental results show that the method can effectively detect whether the face data is tampered, and its detection accuracy is better than other methods. In addition, it also achieves good performance in terms of recall, precision, and F1 score.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"377 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115174057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying Alzheimer’s disease from 4D fMRI using hybrid 3DCNN and GRU networks","authors":"Yifan Cao, Meili Lu, Jiajun Fu, Zhaohua Guo, Zicheng Gao","doi":"10.1117/12.2644454","DOIUrl":"https://doi.org/10.1117/12.2644454","url":null,"abstract":"In recently years, motivated by the excellent performance in automatic feature extraction and complex patterns detecting from raw data, recently, deep learning technologies have been widely used in analyzing fMRI data for Alzheimer’s disease classification. However, most current studies did not take full advantage of the temporal and spatial features of fMRI, which may result in ignoring some important information and influencing classification performance. In this paper, we propose a novel approach based on deep learning to learn temporal and spatial features of 4D fMRI for Alzheimer’s disease classification. This model is composed of 3D Convolutional Neural Network(3DCNN) and recurrent neural network. Experimental results demonstrated that the proposed approach could discriminate Alzheimer’s patients from healthy controls with a high accuracy rate.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117122171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Blind image quality assessment based on transformer","authors":"Linxin Li, Chu Chen, Naixuan Zhao","doi":"10.1117/12.2643493","DOIUrl":"https://doi.org/10.1117/12.2643493","url":null,"abstract":"Transformer has achieved milestones in natural language processing (NLP). Due to its excellent global and remote semantic information interaction performance, it has gradually been applied in vision tasks. In this paper, we propose PTIQ, which is a pure Transformer structure for Image Quality Assessment. Specifically, we use Swin Transformer Blocks as backbone to extract image features. The extracted feature vectors after extra state embedding and position embedding are fed into the original transformer encoder. Then, the output is passed to the MLP head to predict quality score. Experimental results demonstrate that the proposed architecture achieves outstanding performance.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124943319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Monocular inertial indoor location algorithm considering point and line features","authors":"Ju Huo, Liang Wei, Chuwei Mao","doi":"10.1117/12.2644842","DOIUrl":"https://doi.org/10.1117/12.2644842","url":null,"abstract":"Compared with point features, line features in the environment have more structural information. When indoor texture is not rich, making full use of the structural information of line features can improve the robustness and accuracy of simultaneous location and mapping algorithm. In this paper, we propose an improved monocular inertial indoor location algorithm considering point and line features. Firstly, the point features and line features in the environment are extracted, matched and parameterized, and then the inertial sensor is used to estimate the initial pose, and the tightly coupled method is adopted to optimize the observation error of the point and line features and the measurement error of the inertial sensor simultaneously in the back optimization to achieve accurate estimation of the pose of unmanned aerial vehicle. Finally, loop closure detection and pose graph optimization are used to optimize the pose in real time. The test results on public datasets show that the location accuracy of the proposed method is superior to 10 cm under sufficient light and texture conditions. The angle measurement accuracy is better than 0.05 rad, and the output frequency of positioning results is 10Hz, which effectively improves the accuracy of traditional visual inertial location method and meets the requirements of real-time.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123054438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}