{"title":"Receptive field enhancement and attention feature fusion network for underwater object detection","authors":"Huipu Xu, Zegang He, Shuo Chen","doi":"10.1117/1.jei.33.3.033007","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033007","url":null,"abstract":"Underwater environments have characteristics such as unclear imaging and complex backgrounds that lead to poor performance when applying mainstream object detection models directly. To improve the accuracy of underwater object detection, we propose an object detection model, RF-YOLO, which uses a receptive field enhancement (RFE) module in the backbone network to finish RFE and extract more effective features. We design the free-channel iterative attention feature fusion module to reconstruct the neck network and fuse different scales of feature layers to achieve cross-channel attention feature fusion. We use Scylla-intersection over union (SIoU) as the loss function of the model, which makes the model converge to the optimal direction of training through the angle cost, distance cost, shape cost, and IoU cost. The network parameters increase after adding modules, and the model is not easy to converge to the optimal state, so we propose a training method that effectively mines the performance of the detection network. Experiments show that the proposed RF-YOLO achieves a mean average precision of 87.56% and 86.39% on the URPC2019 and URPC2020 datasets, respectively. Through comparative experiments and ablation experiments, it was verified that the proposed network model has a higher detection accuracy in complex underwater environments.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"18 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140887026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Posture-guided part learning for fine-grained image categorization","authors":"Wei Song, Dongmei Chen","doi":"10.1117/1.jei.33.3.033013","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033013","url":null,"abstract":"The challenge in fine-grained image classification tasks lies in distinguishing subtle differences among fine-grained images. Existing image classification methods often only explore information in isolated regions without considering the relationships among these parts, resulting in incomplete information and a tendency to focus on individual parts. Posture information is hidden among these parts, so it plays a crucial role in differentiating among similar categories. Therefore, we propose a posture-guided part learning framework capable of extracting hidden posture information among regions. In this framework, the dual-branch feature enhancement module (DBFEM) highlights discriminative information related to fine-grained objects by extracting attention information between the feature space and channels. The part selection module selects multiple discriminative parts based on the attention information from DBFEM. Building upon this, the posture feature fusion module extracts semantic features from discriminative parts and constructs posture features among different parts based on these semantic features. Finally, by fusing part semantic features with posture features, a comprehensive representation of fine-grained object features is obtained, aiding in differentiating among similar categories. Extensive evaluations on three benchmark datasets demonstrate the competitiveness of the proposed framework compared with state-of-the-art methods.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"23 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140932873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient and expressive high-resolution image synthesis via variational autoencoder-enriched transformers with sparse attention mechanisms","authors":"Bingyin Tang, Fan Feng","doi":"10.1117/1.jei.33.3.033002","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033002","url":null,"abstract":"We introduce a method for efficient and expressive high-resolution image synthesis, harnessing the power of variational autoencoders (VAEs) and transformers with sparse attention (SA) mechanisms. By utilizing VAEs, we can establish a context-rich vocabulary of image constituents, thereby capturing intricate image features in a superior manner compared with traditional techniques. Subsequently, we employ SA mechanisms within our transformer model, improving computational efficiency while dealing with long sequences inherent to high-resolution images. Extending beyond traditional conditional synthesis, our model successfully integrates both nonspatial and spatial information while also incorporating temporal dynamics, enabling sequential image synthesis. Through rigorous experiments, we demonstrate our method’s effectiveness in semantically guided synthesis of megapixel images. Our findings substantiate this method as a significant contribution to the field of high-resolution image synthesis.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"15 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140839767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Test-time adaptation via self-training with future information","authors":"Xin Wen, Hao Shen, Zhongqiu Zhao","doi":"10.1117/1.jei.33.3.033012","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033012","url":null,"abstract":"Test-time adaptation (TTA) aims to address potential differences in data distribution between the training and testing phases by modifying a pretrained model based on each specific test sample. This process is especially crucial for deep learning models, as they often encounter frequent changes in the testing environment. Currently, popular TTA methods rely primarily on pseudo-labels (PLs) as supervision signals and fine-tune the model through backpropagation. Consequently, the success of the model’s adaptation depends directly on the quality of the PLs. High-quality PLs can enhance the model’s performance, whereas low-quality ones may lead to poor adaptation results. Intuitively, if the PLs predicted by the model for a given sample remain consistent in both the current and future states, it suggests a higher confidence in that prediction. Using such consistent PLs as supervision signals can greatly benefit long-term adaptation. Nevertheless, this approach may induce overconfidence in the model’s predictions. To counter this, we introduce a regularization term that penalizes overly confident predictions. Our proposed method is highly versatile and can be seamlessly integrated with various TTA strategies, making it immensely practical. We investigate different TTA methods on three widely used datasets (CIFAR10C, CIFAR100C, and ImageNetC) with different scenarios and show that our method achieves competitive or state-of-the-art accuracies on all of them.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"37 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140932739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Motion trajectory reconstruction degree: a key frame selection criterion for surveillance video","authors":"Yunzuo Zhang, Yameng Liu, Jiayu Zhang, Shasha Zhang, Shuangshuang Wang, Yu Cheng","doi":"10.1117/1.jei.33.3.033009","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033009","url":null,"abstract":"The primary focus of key frame extraction lies in extracting changes in the motion state from surveillance videos and considering them to be crucial content. However, existing key frame evaluation indicators cannot accurately assess whether the algorithm can capture them. Hence, key frame extraction methods are assessed from the viewpoint of target trajectory reconstruction. The motion trajectory reconstruction degree (MTRD), a key frame selection criterion based on maintaining target global and local motion information, is then put forth. Initially, this evaluation indicator extracts key frames using various key frame extraction methods and reconstructs the motion trajectory based on these key frames using a linear interpolation algorithm. Then, the original motion trajectories of the target are quantified and compared with the reconstructed set of motion trajectories. The more minor the MTRD discrepancy is, the better the trajectory overlap is, and the more accurate the key frames extracted with this method will be for the description of the video content. Finally, inspired by the novel MTRD criterion, we develop an MTRD-oriented key frame extraction method for the surveillance video. The outcomes of the simulations demonstrate that MTRD can more accurately capture the variations in the global and local motion states and is more compatible with the human visual perception.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"37 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140932888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HMNNet: research on exposure-based nighttime semantic segmentation","authors":"Yang Yang, Changjiang Liu, Hao Li, Chuan Liu","doi":"10.1117/1.jei.33.3.033015","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033015","url":null,"abstract":"In recent years, various segmentation models have been developed successively. However, due to the limited availability of nighttime datasets and the complexity of nighttime scenes, there remains a scarcity of high-performance nighttime semantic segmentation models. Analysis of nighttime scenes has revealed that the primary challenges encountered are overexposure and underexposure. In view of this, our proposed Histogram Multi-scale Retinex with Color Restoration and No-Exposure Semantic Segmentation Network model is based on semantic segmentation of nighttime scenes and consists of three modules and a multi-head decoder. The three modules—Histogram, Multi-Scale Retinex with Color Restoration (MSRCR), and No Exposure (N-EX)—aim to enhance the robustness of image segmentation under different lighting conditions. The Histogram module prevents over-fitting to well-lit images, and the MSRCR module enhances images with insufficient lighting, improving object recognition and facilitating segmentation. The N-EX module uses a dark channel prior method to remove excess light covering the surface of an object. Extensive experiments show that the three modules are suitable for different network models and can be inserted and used at will. They significantly improve the model’s segmentation ability for nighttime images while having good generalization ability. When added to the multi-head decoder network, mean intersection over union increases by 6.2% on the nighttime dataset Rebecca and 1.5% on the daytime dataset CamVid.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"27 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140932890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep metric learning method for open-set iris recognition","authors":"Guang Huo, Ruyuan Li, Jianlou Lou, Xiaolu Yu, Jiajun Wang, Xinlei He, Yue Wang","doi":"10.1117/1.jei.33.3.033016","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033016","url":null,"abstract":"The existing iris recognition methods offer excellent recognition performance for known classes, but they do not perform well when faced with unknown classes. The process of identifying unknown classes is referred to as open-set recognition. To improve the robustness of iris recognition system, this work integrates a hash center to construct a deep metric learning method for open-set iris recognition, called central similarity based deep hash. It first maps each iris category into defined hash centers using a generation hash center algorithm. Then, OiNet is trained to each iris texture to cluster around the corresponding hash center. For testing, cosine similarity is calculated for each pair of iris textures to estimate their similarity. Based on experiments conducted on public datasets, along with evaluations of performance within the dataset and across different datasets, our method demonstrates substantial performance advantages compared with other algorithms for open-set iris recognition.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"127 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140932889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"KT-NeRF: multi-view anti-motion blur neural radiance fields","authors":"Yining Wang, Jinyi Zhang, Yuxi Jiang","doi":"10.1117/1.jei.33.3.033006","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033006","url":null,"abstract":"In the field of three-dimensional (3D) reconstruction, neural radiation fields (NeRF) can implicitly represent high-quality 3D scenes. However, traditional neural radiation fields place very high demands on the quality of the input images. When motion blurred images are input, the requirement of NeRF for multi-view consistency cannot be met, which results in a significant degradation in the quality of the 3D reconstruction. To address this problem, we propose KT-NeRF that extends NeRF to motion blur scenes. Based on the principle of motion blur, the method is derived from two-dimensional (2D) motion blurred images to 3D space. Then, Gaussian process regression model is introduced to estimate the motion trajectory of the camera for each motion blurred image, with the aim of learning accurate camera poses at key time stamps during the exposure time. The camera poses at the key time stamps are used as inputs to the NeRF in order to allow the NeRF to learn the blur information embedded in the images. Finally, the parameters of the Gaussian process regression model and the NeRF are jointly optimized to achieve multi-view anti-motion blur. The experiment shows that KT-NeRF achieved a peak signal-to-noise ratio of 29.4 and a structural similarity index of 0.85, an increase of 3.5% and 2.4%, respectively, over existing advanced methods. The learned perceptual image patch similarity was also reduced by 7.1% to 0.13.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"105 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140839785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Three-dimensional shape estimation of wires from three-dimensional X-ray computed tomography images of electrical cables","authors":"Shiori Ueda, Kanon Sato, Hideo Saito, Yutaka Hoshina","doi":"10.1117/1.jei.33.3.031209","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.031209","url":null,"abstract":"Electrical cables consist of numerous wires, the three-dimensional (3D) shape of which significantly impacts the cables’ overall properties, such as bending stiffness. Although X-ray computed tomography (CT) provides a non-destructive method to assess these properties, accurately determining the 3D shape of individual wires from CT images is challenging due to the large number of wires, low image resolution, and indistinguishable appearance of the wires. Previous research lacked quantitative evaluation for wire tracking, and its overall accuracy heavily relied on the accuracy of wire detection. In this study, we present a long short-term memory-based approach for wire tracking that improves robustness against detection errors. The proposed method predicts wire positions in subsequent frames based on previous frames. We evaluate the performance of the proposed method using both actual annotated cables and artificially noised annotations. Our method exhibits greater tracking accuracy and robustness to detection errors compared with the previous method.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"17 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140602006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Flexible machine/deep learning microservice architecture for industrial vision-based quality control on a low-cost device","authors":"Stefano Toigo, Brendon Kasi, Daniele Fornasier, Angelo Cenedese","doi":"10.1117/1.jei.33.3.031208","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.031208","url":null,"abstract":"This paper aims to delineate a comprehensive method that integrates machine vision and deep learning for quality control within an industrial setting. The proposed innovative approach leverages a microservice architecture that ensures adaptability and flexibility to different scenarios while focusing on the employment of affordable, compact hardware, and it achieves exceptionally high accuracy in performing the quality control task and keeping a minimal computation time. Consequently, the developed system operates entirely on a portable smart camera, eliminating the need for additional sensors such as photocells and external computation, which simplifies the setup and commissioning phases and reduces the overall impact on the production line. By leveraging the integration of the embedded system with the machinery, this approach offers real-time monitoring and analysis capabilities, facilitating the swift detection of defects and deviations from desired standards. Moreover, the low-cost nature of the solution makes it accessible to a wider range of manufacturing enterprises, democratizing quality processes in Industry 5.0. The system was successfully implemented and is fully operational in a real industrial environment, and the experimental results obtained from this implementation are presented in this work.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"88 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140076147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}