{"title":"Multi-visual information fusion and aggregation for video action classification","authors":"Xuchao Gong, Zongmin Li, Xiangdong Wang","doi":"10.1117/12.2644312","DOIUrl":"https://doi.org/10.1117/12.2644312","url":null,"abstract":"In order to fully mine the performance improvement of spatio-temporal features in video action classification, we propose a multi-visual information fusion time sequence prediction network (MI-TPN) which based on the feature aggregation model ActionVLAD. The method includes three parts: multi-visual information fusion, time sequence feature modeling and spatiotemporal feature aggregation. In the multi-visual information fusion, the RGB features and optical flow features are combined, the visual context and action description details are fully considered. In time sequence feature modeling, the temporal relationship is modeled by LSTM to obtain the importance measurement between temporal description features. Finally, in feature aggregation, time step feature and spatiotemporal center attention mechanism are used to aggregate features and projected them into a common feature space. This method obtains good results on three commonly used comparative datasets UCF101, HMDB51 and Something.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114891973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A defect detection method for plastic gears based on deep learning and machine vision","authors":"Y. Hao, Meng Xiang, Zichao Zhu","doi":"10.1117/12.2644273","DOIUrl":"https://doi.org/10.1117/12.2644273","url":null,"abstract":"For the detection of plastic gears, most factories still use manual method with measurement tools. Therefore, the efforts expended in their defect detection are tremendous in the production processes. This paper proposes a new method that detects defection for plastic gears during their production and recycling processes. An image dataset of different kind of plastic gears was created. Then, a defect detection DL model was proposed based on GoogLeNet; it detected whether the plastic gears have missing teeth (MT), edge fin (EF), or good quality (GQ). An independent dataset was created to test the DL model: the accuracy of this model reached 94.8%. Combined with MV and DL methods, this paper realizes the automatic detection of plastic gear defects. Based on the independent plastic gear data set, the effect of defect detection method is verified by experiments. The results have important theoretical value and practical significance for liberating manpower and promoting the automatic process of plastic gear defect detection.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125987872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reference-driven undersampled MRI reconstruction using automated stopping deep image prior","authors":"Guisong Wang, Xiaofeng Du, Yanhua Qin, Yifan He","doi":"10.1117/12.2644282","DOIUrl":"https://doi.org/10.1117/12.2644282","url":null,"abstract":"Magnetic resonance image (MRI) reconstruction from undersampled k-space data using unsupervised learning methods suffers from insufficient a priori knowledge and the lack of stopping criterion. This work introduces a high-resolution reference image to tackle these issues. Specifically, we explicitly broadcast the reference image into the proposed network, transferring the reference image structure priors to the recovered image. In addition, the reference image helps to develop a criterion to determine the best-reconstructed image, so training stops automatically once the conditions are met. Experimental results show that the proposed method can reduce artifacts without using a priori training set.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"12342 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130310753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hong Lin, Rita Tse, Su-Kit Tang, Z. Qiang, Jinliang Ou, Giovanni Pau
{"title":"Tobacco plant disease dataset","authors":"Hong Lin, Rita Tse, Su-Kit Tang, Z. Qiang, Jinliang Ou, Giovanni Pau","doi":"10.1117/12.2644288","DOIUrl":"https://doi.org/10.1117/12.2644288","url":null,"abstract":"Tobacco is a valuable plant in agricultural and commercial industry. Any disease infection to the plant may lower the harvest and interfere the operation of supply chain in the market. Image-based deep learning methods are cutting-edge technologies that can facilitate the diagnosis of diseases efficiently and effectively when large-scale dataset is available for training. However, there is not a public dataset about tobacco currently. A comprehensive dataset is appealed to take advantage of deep learning methods in tobacco cultivation urgently. In this paper, we propose to create a specific dataset for tobacco diseases, called Tobacco Plant Disease Dataset (TPDD). 2721 tobacco leaf images are taken in field. The dataset serves for two purposes: disease classification and leaf detection. For classification, we identify 12 classes and provide two types of disease annotations: 1) Whole Leaf Section; 2) Disease Fragment Section. For leaf detection, we provide two kinds of bounding box: rectangle bounding box and polygon bounding box. In addition, we conduct baseline experiments to illustrate the usefulness of TPDD: 1) using deep learning model to detect single disease and multiple diseases; 2) using YOLO-v3 and Mask-RCNN to detect leaves. We hope that the dataset could support the tobacco industry, also be a benchmark in fine-grained vision classification.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127360859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
X. Su, Gangwei Wang, Zhiqiang Zhang, Jiale Yang, Zhijia Zhang
{"title":"Measuring system for elongation at break of cable insulation sheath based on machine vision","authors":"X. Su, Gangwei Wang, Zhiqiang Zhang, Jiale Yang, Zhijia Zhang","doi":"10.1117/12.2643116","DOIUrl":"https://doi.org/10.1117/12.2643116","url":null,"abstract":"In the production of power cables, the performance test of the cable insulation sheath is an important part. Compared with traditional testing methods, machine vision has the advantages of stable operation, high precision, and high efficiency. Because of this situation, firstly, based on machine vision theory, the structure of the old-fashioned tensile machine was reconstructed, and the whole tensile test process of the cable insulation sheath test was imaged by a CMOS camera, and the color recognition algorithm, effective area segmentation algorithm, and workpiece were proposed. The fracture judgment detection algorithm and the corrosion difference algorithm are used to calculate the distance between the marked lines and then calculate the elongation at the break of the cable material. Through systematic experiments on the same batch of cable jackets, the deviation of the elongation at break measured by visual inspection is the largest, no more than 1%. The experimental results and practical applications show that the machine vision-based visual inspection system has higher accuracy, faster efficiency, and more stable and reliable operation than the traditional inspection system.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"12342 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129415939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Wei, Biao Qiao, Hua-bin Wang, Mengxin Zhang, Shijun Liu, L. Tao
{"title":"3D face alignment and face reconstruction based on image sequence","authors":"Y. Wei, Biao Qiao, Hua-bin Wang, Mengxin Zhang, Shijun Liu, L. Tao","doi":"10.1117/12.2644477","DOIUrl":"https://doi.org/10.1117/12.2644477","url":null,"abstract":"Existing 3D face alignment and face reconstruction methods mainly focus on the accuracy of the model. When the existing methods are applied to dynamic videos, the stability and accuracy are significantly reduced. To overcome this problem, we propose a novel regression framework that strikes a balance between accuracy and stability. First, on the basis of lightweight backbone, encoder-decoder structure is used to jointly learn expression details and detailed 3D face from video images to recover shape details and their relationship to facial expression, and dynamic regression of a small number of 3D face parameters, effectively improve the speed and accuracy. Secondly, in order to further improve the stability of face landmarks in video, a jitter loss function of multi-frame image joint learning is proposed to strengthen the correlation between frames and face landmarks in video, and reduce the difference amplitude of face landmarks between adjacent frames to reduce the jitter of face landmarks. Experiments on several challenging datasets verify the effectiveness of our method.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127625035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic segmentation of road scene based on multi-scale feature extraction and deep supervision","authors":"Longfei Wang, Chunman Yan","doi":"10.1117/12.2644695","DOIUrl":"https://doi.org/10.1117/12.2644695","url":null,"abstract":"Aiming at the problems of inaccurate segmentation edges, poor adaptability to multi-scale road targets, prone to false segmentation and missing segmentation when segmenting road targets with various and changeable occlusions in the traditional U-Net model, a semantic segmentation model of road scene based on multi-scale feature extraction and deep supervision module is proposed. Firstly, the dual attention module is embedded in the U-Net encoder, which can make the model have the ability to capture the context information of channel dimension and spatial dimension in the global range, and enhance the road features; Secondly, before upsampling, the feature map containing high-level semantic information is input into ASPP module to obtain road features of different scales; Finally, the deep supervision module is introduced into the upsampling part to learn the feature representation at different levels and retain more road detail features. Experiments are carried out on CamVid dataset and Cityscapes dataset. The results show that our Network can effectively segment road targets with different scales, and the segmented road contour is more complete and clear, which improves the accuracy of semantic segmentation while ensuring a certain segmentation speed.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122639747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiaying Liu, Jin Wang, Longhua Sun, Jie Pei, Qing Zhu
{"title":"Cluster-based point cloud attribute compression using inter prediction and graph Fourier transform","authors":"Jiaying Liu, Jin Wang, Longhua Sun, Jie Pei, Qing Zhu","doi":"10.1117/12.2644218","DOIUrl":"https://doi.org/10.1117/12.2644218","url":null,"abstract":"With the rapid development of 3D capture technologies, point cloud has been widely used in many emerging applications such as augmented reality, autonomous driving, and 3D printing. However, point cloud, used to represent real world objects in these applications, may contain millions of points, which results in huge data volume. Therefore, efficient compression algorithms are essential for point cloud when it comes to storage and real-time transmission issues. Specially, the attribute compression of point cloud is still challenging owing to the sparsity and irregular distribution of corresponding points in 3D space. In this paper, we present a novel point cloud attribute compression scheme based on inter-prediction of blocks and graph Laplacian transforms for attributes residual. Firstly, we divide the entire point cloud into adaptive sub-clouds via K-means based on the geometry to acquire sub-clouds, which enables efficient representation with less cost. Secondly, the sub-clouds are divided into two parts, one is the attribute means of the sub clouds, another is the attribute residual by removing the means. For the attribute means, we use inter-prediction between sub-clouds to remove the attribute redundancy, and the attribute residual is encoded after graph Fourier transforming. Experimental results demonstrate that the proposed scheme is much more efficient than traditional attribute compression schemes.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115594123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic heart segmentation based on convolutional networks using attention mechanism","authors":"Guodong Zhang, Yu Liu, Wei Guo, Wenjun Tan, Zhaoxuan Gong, M. Farooq","doi":"10.1117/12.2643378","DOIUrl":"https://doi.org/10.1117/12.2643378","url":null,"abstract":"Heart segmentation is challenging due to the poor image contrast of heart in the CT images. Since manual segmentation of the heart is tedious and time-consuming, we propose an attention-based Convolution Neural Network (CNN) for heart segmentation. First, one-hot preprocessing is performed on the multi-tissue CT images. U-Net network with Attention-gate is then applied to obtain the heart region. We compared our method with several CNN methods in terms of dice coefficient. Results show that our method outperforms other methods for segmentation.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115969236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kang Liang, Hongtu Xie, Xinqiao Jiang, Xiao Hu, Kaipeng Chen, Guoqian Wang
{"title":"Radio frequency interference suppression based on two-dimensional frequency domain notch for P-band ultra-wideband SAR","authors":"Kang Liang, Hongtu Xie, Xinqiao Jiang, Xiao Hu, Kaipeng Chen, Guoqian Wang","doi":"10.1117/12.2643264","DOIUrl":"https://doi.org/10.1117/12.2643264","url":null,"abstract":"P-band ultra-wideband synthetic aperture radar (UWB SAR) not only has the characteristics of the high-resolution imaging, but also has the well capability of the foliage penetrating, which is potential of detecting and imaging the concealed target under the vegetation. However, there are a lot of the radio, television and mobile communication signals in the P-band, which are called as the radio frequency interference (RFI) signals. These RFI signals are mixed with target echo signals, which will cause the serious interference in the P-band UWB SAR imaging. The traditional notch method is easy to implement the RFI suppression, so it has been widely used. However, the traditional notch method is to notch each pulse echo individually, which has a high computational complexity. At the same time, the RFI suppression of each pulse echo separately will always lead to a large amount of the residual interference, so the traditional notch method has the poor RFI suppression effect. Based on the traditional notch method, this paper proposes an RFI suppression method based on the two-dimensional frequency domain (2DFD) notch, which can realize one-time processing of all echo pulses so that improve the efficiency of the RFI suppression. Meanwhile, because the bandwidth of the RFI signal is much smaller than that of the SAR echo signal, converting the received SAR echo signal to the 2DFD can further concentrate the energy of the RFI signals, so it has the better RFI suppression effect. The simulation results show that the proposed RFI suppression method based on the 2DFD notch can not only improve the efficiency of the RFI suppression but also have the better effect of the RFI suppression.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"617 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115827208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}