Junpeng Xu, Xiangbo Zhu, Lei Shi, Jin Li, Ziman Guo
{"title":"Squeeze-and-excitation attention and bi-directional feature pyramid network for filter screens surface detection","authors":"Junpeng Xu, Xiangbo Zhu, Lei Shi, Jin Li, Ziman Guo","doi":"10.1117/1.jei.33.4.043044","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043044","url":null,"abstract":"Based on the enhanced YOLOv5, a deep learning defect detection technique is presented to deal with the problem of inadequate effectiveness in manually detecting problems on the surface of filter screens. In the last layer of the backbone network, the method combines the squeeze-and-excitation attention mechanism module, the method assigns weights to image locations based on the channel domain perspective to obtain more feature information. It also compares the results with a simple, parameter-free attention model (SimAM), which is an attention mechanism without the channel domain, and the results are higher than SimAM 0.7%. In addition, the neck network replaces the basic PANet structure with the bi-directional feature pyramid network module, which introduces multi-scale feature fusion. The experimental results show that the improved YOLOv5 algorithm has an average defect detection accuracy of 97.7% on the dataset, which is 11.3%, 12.8%, 2%, 7.8%, 5.1%, and 1.3% higher than YOLOv3, faster R-CNN, YOLOv5, SSD, YOLOv7, and YOLOv8, respectively. It can quickly and accurately identify various defects on the surface of the filter, which has an outstanding contribution to the filter manufacturing industry.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"12 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fusion 3D object tracking method based on region and point cloud registration","authors":"Yixin Jin, Jiawei Zhang, Yinhua Liu, Wei Mo, Hua Chen","doi":"10.1117/1.jei.33.4.043048","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043048","url":null,"abstract":"Tracking rigid objects in three-dimensional (3D) space and 6DoF pose estimating are essential tasks in the field of computer vision. In general, the region-based 3D tracking methods have emerged as the optimal solution for weakly textured objects tracking within intricate scenes in recent years. However, tracking robustness in situations such as partial occlusion and similarly colored backgrounds is relatively poor. To address this issue, an improved region-based tracking method is proposed for achieving accurate 3D object tracking in the presence of partial occlusion and similarly colored backgrounds. First, a regional cost function based on the correspondence line is adopted, and a step function is proposed to alleviate the misclassification of sampling points in scenes. Afterward, in order to reduce the influence of similarly colored background and partial occlusion on the tracking performance, a weight function that fuses color and distance information of the object contour is proposed. Finally, the transformation matrix of the inter-frame motion obtained by the above region-based tracking method is used to initialize the model point cloud, and an improved point cloud registration method is adopted to achieve accurate registration between the model point cloud and the object point cloud to further realize accurate object tracking. The experiments are conducted on the region-based object tracking (RBOT) dataset and the real scenes, respectively. The results demonstrate that the proposed method outperforms the state-of-the-art region-based 3D object tracking method. On the RBOT dataset, the average tracking success rate is improved by 0.5% across five image sequences. In addition, in real scenes with similarly colored backgrounds and partial occlusion, the average tracking accuracy is improved by 0.28 and 0.26 mm, respectively.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"8 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatio-temporal enhancement method based on dense connection structure for compressed video","authors":"Hongyao Li, Xiaohai He, Xiaodong Bi, Shuhua Xiong, Honggang Chen","doi":"10.1117/1.jei.33.4.043054","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043054","url":null,"abstract":"Under limited bandwidth conditions, video transmission often employs lossy compression to reduce the data volume, inevitably introducing compression noise. Quality enhancement of compressed videos can effectively recover the information loss incurred during the compression process. Currently, multi-frame quality enhancement of compressed videos has shown performance advantages compared to single-frame methods, as it utilizes the temporal correlation of videos. Methods based on deformable convolutions obtain spatio-temporal fusion features for reconstruction through multi-frame alignment. However, due to the limited utilization of deep information and sensitivity to alignment accuracy, these methods yield suboptimal results, especially in scenarios with scene changes and intense motion. To overcome these limitations, we propose a dense network-based quality enhancement method to obtain more accurate spatio-temporal fusion features. Specifically, the deep spatial features are first extracted from the to-be-enhanced frames using dense connections, then combined with the aligned features obtained from deformable convolution through the convolution and attention mechanism to make the network more attentive to useful branches in an adaptive way, and finally, the enhanced frames are obtained through the quality enhancement module of the dense connection structure. The experimental results show that when the quantization parameter is 37, the proposed method can improve the average peak signal-to-noise ratio by 0.99 dB in the lowdelay_P configuration.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"20 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast and robust object region segmentation with self-organized lattice Boltzmann based active contour method","authors":"Fatema A. Albalooshi, Vijayan K. Asari","doi":"10.1117/1.jei.33.4.043050","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043050","url":null,"abstract":"We propose an approach leveraging the power of self-organizing maps (SOMs) in conjunction with a multiscale local image fitting (LIF) level-set function to enhance the capabilities of the region-based active contour model (ACM). In addition, we employ the lattice Boltzmann method (LBM) to ensure efficient convergence during the segmentation process. The SOM learns the underlying patterns and structures of both the background region and the object of interest region in an image, allowing for more accurate and robust segmentation results. Our multiscale LIF level-set approach influences image-specific fitting criteria into the energy functional, considering the features extracted by the SOM. Finally, the LBM is utilized to solve the level set equation and evolve the contour, allowing for a faster contour evolution. To evaluate the effectiveness of our approach, we performed our experiments on the challenging Pascal Visual Object Classes Challenge 2012 dataset. This dataset consists of images containing objects with diverse characteristics, such as illumination variations, shadows, occlusions, scale changes, and cluttered backgrounds. Our experimental results highlight the efficiency and robustness of our proposed method in achieving accurate segmentation. In terms of accuracy, our approach outperforms state-of-the-art learning-based ACMs, reaching a precision value of up to 93%. Moreover, our approach also demonstrates improvements in terms of computation time, leading to a reduction in computational time of 76% compared with the state-of-the-art methods. By integrating SOMs and the LBM, we enhance the efficiency of the segmentation process. This enables us to achieve accurate segmentation within reasonable time frames, making our method practical for real-world applications. Furthermore, we conducted experiments on medical imagery and thermal imagery, which yielded precise results.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"7 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DeepLab-Rail: semantic segmentation network for railway scenes based on encoder-decoder structure","authors":"Qingsong Zeng, Linxuan Zhang, Yuan Wang, Xiaolong Luo, Yannan Chen","doi":"10.1117/1.jei.33.4.043038","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043038","url":null,"abstract":"Understanding the perimeter objects and environment changes in railway scenes is crucial for ensuring the safety of train operation. Semantic segmentation is the basis of intelligent perception and scene understanding. Railway scene categories are complex and effective features are challenging to extract. This work proposes a semantic segmentation network DeepLab-Rail based on classic yet effective encoder-decoder structure. It contains a lightweight feature extraction backbone embedded with channel attention (CA) mechanism to keep computational complexity low. To enrich the receptive fields of convolutional modules, we design a parallel and cascade convolution module called compound-atrous spatial pyramid pooling and a combination of dilated convolution ratio is selected through experiments to obtain multi-scale features. To fully use the shallow features and the high-level features, efficient CA mechanism is introduced and also the mixed loss function is designed for the problem of unbalanced label categories of the dataset. Finally, the experimental results on the RailSem19 railway dataset show that the mean intersection over union reaches 65.52% and the PA reaches 88.48%. The segmentation performance of railway confusing facilities, such as signal lights and catenary pillars, has been significantly improved and surpasses other advanced methods to our best knowledge.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"44 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Settlement detection from satellite imagery using fully convolutional network","authors":"Tayaba Anjum, Ahsan Ali, Muhammad Tahir Naseem","doi":"10.1117/1.jei.33.4.043056","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043056","url":null,"abstract":"Geospatial information is essential for development planning, like in the context of land and resource management. Existing research mainly focuses on multi-spectral or panchromatic images with specific sensor details. Incorporating multi-sensory panchromatic images at different scales makes the segmentation problem challenging. In this work, we propose a pixel-based globally trained model with a deep learning network to improve the segmentation results over existing patch-based networks. The proposed model consists of the encoder-decoder mechanism for semantic segmentation. Convolution and pooling layers are used at the encoding phase and transposed convolution and convolution layers are used for the decoding phase. Experiments show about 98.95% correct detection rate and 0.07% false detection rate of our proposed methodology on benchmark images. We prove the effectiveness of the proposed methodology by doing comparisons with previous work.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"37 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peng Zhang, Qing Liu, Shengpeng Li, Fei Liu, Wenjing Liu
{"title":"Coded target recognition algorithm for vision measurement","authors":"Peng Zhang, Qing Liu, Shengpeng Li, Fei Liu, Wenjing Liu","doi":"10.1117/1.jei.33.4.043058","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043058","url":null,"abstract":"Circularly coded targets are widely used in 3D measurement, target tracking, augmented reality, and other fields as feature points to be measured. The traditional coded target recognition algorithm is easily affected by illumination changes and excessive shooting angles, and the recognition accuracy is significantly reduced. Therefore, a new coded target recognition algorithm is required to reduce the effects of illumination and angle on the recognition process. The influence of illumination on the recognition of coding targets was analyzed in depth, and the advantages and disadvantages of traditional algorithms are discussed. A new adaptive threshold image segmentation method was designed, which, in contrast to traditional algorithms, incorporates the feature information of coding targets in the determination of the image segmentation threshold. The experimental results show that this method significantly reduces the influence of illumination variations and cluttered backgrounds on image segmentation. Similarly, the influence of different angles on the recognition process of coding targets was studied. The coding target is decoded by radial sampling of the dense point network, which can effectively reduce the influence of angle on the recognition process and improve the recognition accuracy of coding targets and the robustness of the algorithm. In addition, further experiments verified that the proposed detection and recognition algorithm can better extract and identify with high positioning accuracy and decoding success rate. It can achieve accurate positioning even in complex environments and meet the needs of industrial measurements.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"11 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep inner-knuckle-print recognition using lightweight Siamese network","authors":"Hongxia Wang, Hongwu Yuan","doi":"10.1117/1.jei.33.4.043034","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043034","url":null,"abstract":"Texture features and stability have attracted much attention in the field of biometric recognition. The inner-knuckle print is unique and not easy to forge, so it is widely used in personal identity authentication, criminal detection, and other fields. In recent years, the rapid development of deep learning technology has brought new opportunities for internal-knuckle recognition. We propose a deep inner-knuckle print recognition method named LSKNet network. By establishing a lightweight Siamese network model and combining it with a robust cost function, we can realize efficient and accurate recognition of the inner-knuckle print. Compared to traditional methods and other deep learning methods, the network has lower model complexity and computational resource requirements, which enables it to run under lower hardware configurations. In addition, this paper also uses all the knuckle prints of four fingers for concatenated fusion recognition. Experimental results demonstrate that this method has achieved satisfactory results in the task of internal-knuckle print recognition.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"15 12 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141885521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fine-tuned Siamese neural network–based multimodal vein biometric system with hybrid firefly–particle swarm optimization","authors":"Gurunathan Velliangiri, Sudhakar Radhakrishnan","doi":"10.1117/1.jei.33.4.043035","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043035","url":null,"abstract":"Recent advancements in biometric recognition focus on vein pattern–based person authentication systems. We present a multimodal biometric system using dorsal and finger vein images. By combining Siamese neural networks (SNNs) with hybrid firefly–particle swarm optimization (FF-PSO), we optimize finger and dorsal vein identification and classification. Using FF-PSO to tune SNN parameters is an innovative hybrid optimization approach designed to address the complexities of vein pattern recognition. The proposed system is tested with two public databases: the SDUMLA-HMT finger vein dataset and the Dr. Badawi hand vein dataset. The efficacy of the method is assessed using performance measures such as recall, accuracy, precision, F1 score, false acceptance rate, false rejection rate, and equal error rate. The experimental findings demonstrate that the proposed system achieves an accuracy of 99.5% with the fine-tune SNN and FF-PSO techniques and preprocessing module. The proposed system is also compared with various existing state-of-the-art techniques.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"42 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141885653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint merging and pruning: adaptive selection of better token compression strategy","authors":"Wei Peng, Liancheng Zeng, Lizhuo Zhang, Yue Shen","doi":"10.1117/1.jei.33.4.043045","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043045","url":null,"abstract":"Vision transformer (ViT) is widely used to handle artificial intelligence tasks, making significant advances in a variety of computer vision tasks. However, due to the secondary interaction between tokens, the ViT model is inefficient, which greatly limits the application of the ViT model in real scenarios. In recent years, people have noticed that not all tokens contribute equally to the final prediction of the model, so token compression methods have been proposed, which are mainly divided into token pruning and token merging. Yet, we believe that neither pruning only to reduce non-critical tokens nor merging to reduce similar tokens are optimal strategies for token compression. To overcome this challenge, this work proposes a token compression framework: joint merging and pruning (JMP), which adaptively selects a better token compression strategy based on the similarity between critical tokens and non-critical tokens in each sample. JMP effectively reduces computational complexity while maintaining model performance and does not require the introduction of additional trainable parameters, achieving a good balance between efficiency and performance. Taking DeiT-S as an example, JMP reduces floating point operations by 35% and increases throughput by more than 45% while only decreasing accuracy by 0.2% on ImageNet.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"405 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}