{"title":"Weakly-supervised attention mechanism via score-CAM for fine-grained visual classification","authors":"Yizhou He, E. Zou, Q. Fan","doi":"10.1117/12.2644399","DOIUrl":"https://doi.org/10.1117/12.2644399","url":null,"abstract":"Along with the prosperity and development of computer vision technologies, fine-grained visual classification (FGVC) has now become an intriguing research field due to its broad application prospects. The major challenges of fine-grained classification are mainly two-fold: localization of discriminative region and extraction of fine-grained features. The attention mechanism is a common choice for current state-of-art (SOTA) methods in the FGVC that can significantly improve the performance of distinguishing among fine-grained categories. The attention module in different designs is utilized to capture the discriminative region, and region-based feature representation encodes subtle inter-class differences. However, the attention mechanism without proper supervision may not learn to provide informative guidance to the discriminative region, thus could be meaningless in the FGVC tasks that lack part annotations. We propose a weakly-supervised attention mechanism that integrates visual explanation methods to address confusing issues in the discriminative region localization caused by the absence of supervision and avoid labor-intensive bounding box/part annotations in the meanwhile. We employ Score-CAM, a novel post-hoc visual explanation method based on class activation mapping, to provide supervision and constrain the attention module. We conduct extensive experiments and show that the proposed method outperforms the current SOTA methods in three fine-grained classification tasks on CUB Birds, FGVC Aircraft, and Stanford Cars.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131425082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ship target detection method based on improved CenterNet in synthetic aperture radar images","authors":"Hongtu Xie, Xinqiao Jiang, Jiaxing Chen, Jian Zhang, Xiao Hu, Guoqian Wang, Kai Xie","doi":"10.1117/12.2644364","DOIUrl":"https://doi.org/10.1117/12.2644364","url":null,"abstract":"Deep learning has been widely used for the ship target detection in the synthetic aperture radar (SAR) images. The existing researches mainly uses the anchor frame-based detection method to generate the candidate frames to extract the specific targets. However, this method requires the additional computing resources to filter out the many repeated candidate frames, which will lead to the poor target positioning accuracy and low detection efficiency. To solve these problems, this paper constructs an anchor-free frame for the ship target detection in the SAR images. An improved lightweight detection method based on the target key point is proposed for the real-time detection of the SAR images, which can achieve the rapid and accurate positioning of the ship targets in the SAR images. The experimental results prove that the proposed method has the better detection performance and stronger generalization capability, which is beneficial to realize the real-time detection of the ship targets.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131574443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zewei Wang, Jingjing Zhang, X. Du, Sihua Cao, Wenxuan Wei
{"title":"Correction of images projected on non-white surfaces based on deep neural network","authors":"Zewei Wang, Jingjing Zhang, X. Du, Sihua Cao, Wenxuan Wei","doi":"10.1117/12.2644283","DOIUrl":"https://doi.org/10.1117/12.2644283","url":null,"abstract":"When projecting onto a non-white surface, the projected image is distorted or color mixing by complex luminance and chrominance information, which makes the projection result different from the visual perception of the human eye. The purpose of projection image correction is to remove these effects, and traditional solutions usually estimate parameters from the collected projection samples, compute an inverse model of the projection imaging process, and try to fit a correction function. In this paper, a deep neural network-based projection image correction network (PICN) is designed to implicitly learn complex correction functions. PICN consists of a U-shaped backbone network, a convolutional neural network that extracts projected surface features, and a perceptual loss network that optimizes the correction results. Such a structure can not only extract the deep features and surface interference features of the projected image, but also make the corrected projected image more in line with human visual perception. In addition, we built a projector-camera system under the condition of a fixed global illumination environment for verification experiment, and proved the effectiveness of the proposed method by calculating the evaluation metrics of projected images before and after correction.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114591995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peng Zhang, Jinsong Tang, Heping Zhong, Mingqiang Ning, Yue Fan
{"title":"Pre-rotation only at inference-time: a way to rotation invariance","authors":"Peng Zhang, Jinsong Tang, Heping Zhong, Mingqiang Ning, Yue Fan","doi":"10.1117/12.2644390","DOIUrl":"https://doi.org/10.1117/12.2644390","url":null,"abstract":"Weight sharing across different locations makes Convolutional Neural Networks (CNNs) space shift invariant, i.e., the weights learned in one location can be applied to recognize objects in other locations. However, weight sharing mechanism has been lacked in Rotated Pattern Recognition (RPR) tasks, and CNNs have to learn training samples in different orientations by rote. As such rote-learning strategy has greatly increased the difficulty of training, a new solution for RPR tasks, Pre-Rotation Only At Inference time (PROAI), is proposed to provide CNNs with rotation invariance. The core idea of PROAI is to share CNN weights across multiple rotated versions of the test sample. At the training time, a CNN was trained with samples only in one angle; at the inference-time, test samples were pre-rotated at different angles and then fed into the CNN to calculate classification confidences; at the end both the category and the orientation were predicted using the position of the max value of these confidences. By adopting PROAI, the recognition ability learned at one orientation can be generalized to patterns at any other orientation, and both the number of parameters and the training time of CNN in RPR tasks can be greatly reduced. Experiments show that PROAI enables CNNs with less parameters and training time to achieve state-of-the-art classification and orientation performance on both rotated MNIST and rotated Fashion MNIST datasets.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121824743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongda Zhang, Kaixin Lin, Yuxiang Guan, Zhongxue Gan, Chun Ouyang
{"title":"An improved U-shape neural network for soft exudate segmentation","authors":"Hongda Zhang, Kaixin Lin, Yuxiang Guan, Zhongxue Gan, Chun Ouyang","doi":"10.1117/12.2644349","DOIUrl":"https://doi.org/10.1117/12.2644349","url":null,"abstract":"Diabetic Retinopathy (DR) is a complication with a high blindness rate caused by diabetes. The diagnosis of DR requires examining the patient's fundus several times a year, which is a heavy burden for a patient and consumes a lot of medical resources. Since soft exudate is an early indicator for detecting the presence of DR, an automated and exact segmentation method for soft exudate is helpful for making a rapid diagnosis. Despite recent advances in medical image processing, the segmentation method of soft exudate is still unsatisfactory due to the limited amount of soft exudate data, imbalanced categories, varying scales and so on. In this work, an improved U-shape neural network (IUNet) was proposed according to the characteristic of soft exudate, which consisted of a contracting path and a symmetric expanding path. Both were composed of convolutional layers, multi-scale modules, and shortcut connections. In training process, a data enhancement strategy was used to generate more training data and a weighted cross-entropy loss function to suppress positive and negative sample imbalance. The proposed method had excellent performance on soft exudate task in Indian Diabetic Retinopathy Image Dataset (IDRiD). The area under precision-recall (AUPR) curve score was 0.711, which was superior to the state-of-the-art models.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115410601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic three-dimensional measurement using convolution neural network and binocular structured light system","authors":"Mingxin Chen, Jindong Tian, Dong Li","doi":"10.1117/12.2644400","DOIUrl":"https://doi.org/10.1117/12.2644400","url":null,"abstract":"In this paper, a dynamic three-dimensional measurement method based on convolution neural network and binocular structured light system is proposed. We propose a convolution neural network to extract the real and imaginary terms of the first-order spectrum of a single frame fringe pattern. In our learning model, the loss function is established with output consistency, phase consistency and feature consistency as the joint constraints. And the dataset is built with actual deformed patterns of different scenes and frequencies. Furthermore, a dual frequency stereo phase unwrapping algorithm based on virtual plane is designed. Combined with the network, the absolute phase can be obtained by only two fringe projections in the measurement range, enabling the dynamic three-dimensional reconstruction of discontinuous or multiple isolated objects. The experimental results show the proposed network can significantly improve the accuracy of phase retrieve by 20 times compared to Fourier Transform Profilometry and the measurement error of the measurement system proposed in this paper for calibration sphere is less than 0.04mm. Furthermore, the measurement results of the dynamic process of palm unfolding verify the feasibility and the effectiveness of the proposed method.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"551 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116234052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Szychta, T. Szulc, Ł. Gierz, K. Przybył, K. Koszela
{"title":"Detecting of plant protection spraying by excited luminescence","authors":"M. Szychta, T. Szulc, Ł. Gierz, K. Przybył, K. Koszela","doi":"10.1117/12.2645947","DOIUrl":"https://doi.org/10.1117/12.2645947","url":null,"abstract":"The research was carried out to detect plant protection products on plant leaves using excited luminescence. The tests were carried out for the leaves of two random plant species and four typical plant protection products. The study of excitation and emission spectra (EX-EM) was carried out on the Edinburgh Instruments FS900 luminescence spectrometer equipped with an attachment for measurements from the surface. The collected EX-EM characteristics of clean leaves were compared with the EX-EM characteristics of leaves coated with plant protection products and the EX-EM characteristics of the agents themselves. The obtained results allowed for the assessment of the suitability of the excited luminescence method for the plant protection spraying measuring, including the detection or identification of inappropriate use of plant protection products by a farmer.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126250337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Sui, Jie Liu, Ying-Ying Cheng, Zhaolin Xiao, A. Tian
{"title":"Reversible data hiding simultaneously using substitution of MSB and compression of LSB of encrypted image","authors":"L. Sui, Jie Liu, Ying-Ying Cheng, Zhaolin Xiao, A. Tian","doi":"10.1117/12.2643707","DOIUrl":"https://doi.org/10.1117/12.2643707","url":null,"abstract":"With the development of privacy protection, reversible data hiding methods in encrypted image have drawn extensive research interest. Among them, a new method is proposed based on embedding prediction errors, i.e., EPE-based method, where secret information is embedded in the encrypted most significant bit plane. Not only the original image can be recovered with high quality but also the payload can reach close to 1 bit per pixel (bpp). However, there are potential errors in the process of extracting secret data, because most significant bits of a part of pixels are used as flags to mark prediction error location. In this paper, a reversible data hiding method in encrypted image with high capacity is proposed by combining most significant bit prediction with least significant bit compression. At first, most significant bit of each pixel is predicted and a location map of prediction errors in the original image is generated. In the same time, the original image is encrypted using a stream cipher method. Then, the location map is embedded into the vacated space generated with compressing least significant bits and the secret data is embedded into most significant bits of a part of pixels without prediction errors. In this way, the marked encrypted image is obtained. Finally, the original image can be recovered without any error and the secret information can be extracted correctly from the marked encrypted image. Experimental results show that the proposed method has better performance than EPE-based and other methods.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"39 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125741705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimating lower extremity joint angles during gait using reduced number of sensors count via deep learning","authors":"M. Hossain, Hwan Choi, Zhishan Guo","doi":"10.1117/12.2643786","DOIUrl":"https://doi.org/10.1117/12.2643786","url":null,"abstract":"Estimating lower extremity joint angle during gait is essential for biomechanical analysis and clinical purposes. Traditionally infrared light-based motion capture systems are used to get the joint angle information. However, such an approach is restricted to the lab environment, limiting the applicability of the method in daily living. Inertial Measurement Units (IMU) sensors can solve this limitation but are needed in each body segment, causing discomfort and impracticality in everyday living. As a result, it is desirable to build a system that can measure joint angles in daily living while ensuring user comfort. For this reason, this paper uses deep learning to estimate joint angle during gait using only two IMU sensors mounted on participants' shoes under four different walking conditions, i.e., treadmill, overground, stair, and slope. Specifically, we leverage Gated Recurrent Unit (GRU), 1D, and 2D convolutional layers to create sub-networks and take their average to get a final model in an end-to-end manner. Extensive evaluations are done on the proposed method, which outperforms the baseline and improves the Root Mean Square Error (RMSE) of joint angle prediction by up to 32.96%.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125884660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An improved approach for two-stage detection model","authors":"J. Liu, Xiaolong Ma","doi":"10.1117/12.2643051","DOIUrl":"https://doi.org/10.1117/12.2643051","url":null,"abstract":"With the rapid development of deep learning models, the performance of object detection have made great success in recent years. However, the problem of low detection efficiency still exists in two-stage detection model. In this paper, we design a lightweight fully convolution neural network(LFCNN) as backbone to extract features more efficiently. Firstly, LFCNN is a lightweight network with only a small number of network parameters, which ensures that it can complete the feature extraction task more quickly while maintaining detection accuracy. Secondly, LFCNN uses residual connection to ensure the performance of the deep network and uses dense connection to realize the reuse and fusion of multi-layer features of the network, which significantly improve the detection accuracy. Moreover, we also come up with a novel method called anchor scale generator(ASG) to obtain the proper predefined anchor scales for generating more accurate region proposals, which further enhances localization ability of objects. A large number of experiments on Pascal VOC and COCO datasets show that our approach is superior to other methods in both bounding boxes localization accuracy and detection performance.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128178193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}