Journal of Electronic Imaging最新文献_第9页

Length and salient losses co-supported content-based commodity retrieval neural network 长度和突出损失共同支持的基于内容的商品检索神经网络

IF 1.1 4区计算机科学

Journal of Electronic Imaging Pub Date : 2024-06-01 DOI: 10.1117/1.jei.33.3.033036

Mengqi Chen, Yifan Wang, Qian Sun, Weiming Wang, Fu Lee Wang

引用次数: 0

Spiking ViT: spiking neural networks with transformer—attention for steel surface defect classification 尖峰 ViT：用于钢材表面缺陷分类的尖峰神经网络与变压器注意事项

IF 1.1 4区计算机科学

Journal of Electronic Imaging Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033001

Liang Gong, Hang Dong, Xinyu Zhang, Xin Cheng, Fan Ye, Liangchao Guo, Zhenghui Ge

{"title":"Spiking ViT: spiking neural networks with transformer—attention for steel surface defect classification","authors":"Liang Gong, Hang Dong, Xinyu Zhang, Xin Cheng, Fan Ye, Liangchao Guo, Zhenghui Ge","doi":"10.1117/1.jei.33.3.033001","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033001","url":null,"abstract":"Throughout the steel production process, a variety of surface defects inevitably occur. These defects can impair the quality of steel products and reduce manufacturing efficiency. Therefore, it is crucial to study and categorize the multiple defects on the surface of steel strips. Vision transformer (ViT) is a unique neural network model based on a self-attention mechanism that is widely used in many different disciplines. Conventional ViT ignores the specifics of brain signaling and instead uses activation functions to simulate genuine neurons. One of the fundamental building blocks of a spiking neural network is leaky integration and fire (LIF), which has biodynamic characteristics akin to those of a genuine neuron. LIF neurons work in an event-driven manner such that higher performance can be achieved with less power. The goal of this work is to integrate ViT and LIF neurons to build and train an end-to-end hybrid network architecture, spiking vision transformer (S-ViT), for the classification of steel surface defects. The framework relies on the ViT architecture by replacing the activation functions used in ViT with LIF neurons, constructing a global spike feature fusion module spiking transformer encoder as well as a spiking-MLP classification head for implementing the classification functionality and using it as a basic building block of S-ViT. Based on the experimental results, our method has demonstrated outstanding classification performance across all metrics. The overall test accuracies of S-ViT are 99.41%, 99.65%, 99.54%, and 99.77% on NEU-CLSs, and 95.70%, 95.93%, 96.94%, and 97.19% on XSDD. S-ViT achieves superior classification performance compared to convolutional neural networks and recent findings. Its performance is also improved relative to the original ViT model. Furthermore, the robustness test results of S-ViT show that S-ViT still maintains reliable accuracy when recognizing images that contain Gaussian noise.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"29 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140839766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Flotation froth image deblurring algorithm based on disentangled representations 基于分解表示的浮选泡沫图像去毛刺算法

IF 1.1 4区计算机科学

Journal of Electronic Imaging Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033011

Xianwu Huang, Yuxiao Wang, Zhao Cao, Haili Shang, Jinshan Zhang, Dahua Yu

引用次数: 0

Effective grasp detection method based on Swin transformer 基于斯温变换器的有效抓取检测方法

IF 1.1 4区计算机科学

Journal of Electronic Imaging Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033008

Jing Zhang, Yulin Tang, Yusong Luo, Yukun Du, Mingju Chen

{"title":"Effective grasp detection method based on Swin transformer","authors":"Jing Zhang, Yulin Tang, Yusong Luo, Yukun Du, Mingju Chen","doi":"10.1117/1.jei.33.3.033008","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033008","url":null,"abstract":"Grasp detection within unstructured environments encounters challenges that lead to a reduced success rate in grasping attempts, attributable to factors including object uncertainty, random positions, and differences in perspective. This work proposes a grasp detection algorithm framework, Swin-transNet, which adopts a hypothesis treating graspable objects as a generalized category and distinguishing between graspable and non-graspable objects. The utilization of the Swin transformer module in this framework augments the feature extraction process, enabling the capture of global relationships within images. Subsequently, the integration of a decoupled head with attention mechanisms further refines the channel and spatial representation of features. This strategic combination markedly improves the system’s adaptability to uncertain object categories and random positions, culminating in the precise output of grasping information. Moreover, we elucidate their roles in grasping tasks. We evaluate the grasp detection framework using the Cornell grasp dataset, which is divided into image and object levels. The experiment indicated a detection accuracy of 98.1% and a detection speed of 52 ms. Swin-transNet shows robust generalization on the Jacquard dataset, attaining a detection accuracy of 95.2%. It demonstrates an 87.8% success rate in real-world grasping testing on a visual grasping system, confirming its effectiveness for robotic grasping tasks.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"42 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140932837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Truss tomato detection under artificial lighting in greenhouse using BiSR_YOLOv5 利用 BiSR_YOLOv5 在温室人工照明下检测桁架番茄

IF 1.1 4区计算机科学

Journal of Electronic Imaging Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033014

Xiaoyou Yu, Zixiao Wang, Zhonghua Miao, Nan Li, Teng Sun

{"title":"Truss tomato detection under artificial lighting in greenhouse using BiSR_YOLOv5","authors":"Xiaoyou Yu, Zixiao Wang, Zhonghua Miao, Nan Li, Teng Sun","doi":"10.1117/1.jei.33.3.033014","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033014","url":null,"abstract":"The visual characteristics of greenhouse-grown tomatoes undergo significant alterations under artificial lighting, presenting substantial challenges in accurately detecting targets. To address the diverse appearances of targets, we propose an improved You Only Look Once Version 5 (YOLOv5) model named BiSR_YOLOv5, incorporating the single-point and regional feature fusion module (SRFM) and the bidirectional spatial pyramid pooling fast (Bi-SPPF) module. In addition, the model adopts SCYLLA-intersection over union loss instead of complete intersection over union loss. Experimental results reveal that the BiSR_YOLOv5 model achieves F1 and mAP@0.5 scores of 0.867 and 0.894, respectively, for detecting truss tomatoes. These scores are 2.36 and 1.82 percentage points higher than those achieved by the baseline YOLOv5 algorithm. Notably, the model maintains a size of 13.8M and achieves real-time performance at 35.1 frames per second. Analysis of detection results for both large and small objects indicates that the Bi-SPPF module, which emphasizes finer feature details, is better suited for detecting small-sized targets. Conversely, the SRFM module, with a larger receptive field, is better suited for detecting larger targets. In summary, the BiSR YOLOv5 test results validate the positive impact of accurate identification on subsequent agricultural operations, such as yield estimation or harvest. This is achieved through the implementation of a simple maturity algorithm that utilizes the process of “finding flaws.”","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"60 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140932891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing steganography capacity through multi-stage generator model in generative adversarial network based image concealment 在基于生成对抗网络的图像隐藏技术中，通过多级生成器模型提高隐写能力

IF 1.1 4区计算机科学

Journal of Electronic Imaging Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033026

Bisma Sultan, Mohd. Arif Wani

{"title":"Enhancing steganography capacity through multi-stage generator model in generative adversarial network based image concealment","authors":"Bisma Sultan, Mohd. Arif Wani","doi":"10.1117/1.jei.33.3.033026","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033026","url":null,"abstract":"Traditional steganography algorithms use procedures created by human experts to conceal the secret message inside a cover medium. Generative adversarial networks (GANs) have recently been used to automate this process. However, GAN based steganography has some limitations. The capacity of these models is limited. By increasing the steganography capacity, security is decreased, and distortion is increased. The performance of the extractor network also decreases with increasing the steganography capacity. In this work, an approach for developing a generator model for image steganography is proposed. The approach involves building a generator model, called the late embedding generator model, in two stages. The first stage of the generator model uses only the flattened cover image, and second stage uses a secret message and the first stage’s output to generate the stego image. Furthermore, a dual-training strategy is employed to train the generator network: the first stage focuses on learning fundamental image features through a reconstruction loss, and the second stage is trained with three loss terms, including an adversarial loss, to incorporate the secret message. The proposed approach demonstrates that hiding data only in the deeper layers of the generator network boosts capacity without requiring complex architectures, reducing computational storage requirements. The efficacy of the proposed approach is evaluated by varying the depth of these two stages, resulting in four generator models. A comprehensive set of experiments was performed on the CelebA dataset, which contains more than 200,000 samples. The results show that the late embedding model performs better than the state-of-the-art models. Also, it increases the steganography capacity to more than four times compared with the existing GAN-based steganography methods. The extracted payload achieves an accuracy of 99.98%, with the extractor model successfully decoding the secret message.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"46 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141189241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-view stereo of an object immersed in a refractive medium 沉浸在折射介质中的物体的多视角立体成像

IF 1.1 4区计算机科学

Journal of Electronic Imaging Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033005

Robin Bruneau, Baptiste Brument, Lilian Calvet, Matthew Cassidy, Jean Mélou, Yvain Quéau, Jean-Denis Durou, François Lauze

引用次数: 0

Polarization spatial and semantic learning lightweight network for underwater salient object detection 用于水下突出物体探测的偏振空间和语义学习轻量级网络

IF 1.1 4区计算机科学

Journal of Electronic Imaging Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033010

Xiaowen Yang, Qingwu Li, Dabing Yu, Zheng Gao, Guanying Huo

{"title":"Polarization spatial and semantic learning lightweight network for underwater salient object detection","authors":"Xiaowen Yang, Qingwu Li, Dabing Yu, Zheng Gao, Guanying Huo","doi":"10.1117/1.jei.33.3.033010","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033010","url":null,"abstract":"The absorption by a water body and the scattering of suspended particles cause blurring of object features, which results in a reduced accuracy of underwater salient object detection (SOD). Thus, we propose a polarization spatial and semantic learning lightweight network for underwater SOD. The proposed method is based on a lightweight MobileNetV2 network. Because lightweight networks are not as capable as deep networks in capturing and learning features of complex objects, we build specific feature extraction and fusion modules at different depth stages of backbone network feature extraction to enhance the feature learning capability of the lightweight backbone network. Specifically, we embed a structural feature learning module in the low-level feature extraction stage and a semantic feature learning module in the high-level feature extraction stage to maintain the spatial consistency of low-level features and the semantic commonality of high-level features. We acquired polarized images of underwater objects in natural underwater scenes and constructed a polarized object detection dataset (PODD) for object detection in the underwater environment. Experimental results show that the detection effect of the proposed method on the PODD is better than other SOD methods. Also, we conduct comparative experiments on the RGB-thermal (RGB-T) and RGB-depth (RGB-D) datasets to verify the generalization of the proposed method.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"14 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140932737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Image watermarking scheme employing Gerchberg–Saxton algorithm and integer wavelet transform 采用 Gerchberg-Saxton 算法和整数小波变换的图像水印方案

IF 1.1 4区计算机科学

Journal of Electronic Imaging Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033003

Chaoxia Zhang, Kaiqi Liang, Shangzhou Zhang, Zhihao Chen

{"title":"Image watermarking scheme employing Gerchberg–Saxton algorithm and integer wavelet transform","authors":"Chaoxia Zhang, Kaiqi Liang, Shangzhou Zhang, Zhihao Chen","doi":"10.1117/1.jei.33.3.033003","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033003","url":null,"abstract":"Image watermarking technology plays a key role in the protection of intellectual property rights. In addition to digital watermarking, optical watermarking has also been widely considered. A watermarking scheme based on Gerchberg–Saxton (GS) algorithm and integer wavelet transform (IWT) is proposed for image encryption. The scheme uses the unique phase reconstruction characteristics of GS algorithm, which makes it able to deal with a variety of complex local attacks in the process of protection and effectively restore the original image information. The obfuscation of position and numerical value information is realized by means of variable step Joseph space scrambling and pixel value bit processing. The carrier image is decomposed into subbands with different frequencies using IWT, all the information of the secret image is embedded bit by bit, which realizes the hiding of the image information. In addition, the SHA-256 function, the RSA algorithm, and the hyperchaotic system are combined to generate the cipher stream. The experimental results show that the algorithm has good imperceptibility and security, as well as strong robustness to cutting and salt and pepper noise attacks, and can restore the image quality well.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"56 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140839765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving image super-resolution with structured knowledge distillation-based multimodal denoising diffusion probabilistic model 利用基于结构化知识提炼的多模态去噪扩散概率模型提高图像超分辨率

IF 1.1 4区计算机科学

Journal of Electronic Imaging Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033004

Li Huang, JingKe Yan, Min Wang, Qin Wang

{"title":"Improving image super-resolution with structured knowledge distillation-based multimodal denoising diffusion probabilistic model","authors":"Li Huang, JingKe Yan, Min Wang, Qin Wang","doi":"10.1117/1.jei.33.3.033004","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033004","url":null,"abstract":"In the realm of low-resolution (LR) to high-resolution (HR) image reconstruction, denoising diffusion probabilistic models (DDPMs) are recognized for their superior perceptual quality over other generative models, attributed to their adept handling of various degradation factors in LR images, such as noise and blur. However, DDPMs predominantly focus on a single modality in the super-resolution (SR) image reconstruction from LR images, thus overlooking the rich potential information in multimodal data. This lack of integration and comprehensive processing of multimodal data can impede the full utilization of the complementary characteristics of different data types, limiting their effectiveness across a broad range of applications. Moreover, DDPMs require thousands of evaluations to reconstruct high-quality SR images, which significantly impacts their efficiency. In response to these challenges, a novel multimodal DDPM based on structured knowledge distillation (MKDDPM) is introduced. This approach features a multimodal-based DDPM that effectively leverages sparse prior information from another modality, integrated into the MKDDPM network architecture to optimize the solution space and detail features of the reconstructed image. Furthermore, a structured knowledge distillation method is proposed, leveraging a well-trained DDPM and iteratively learning a new DDPM, with each iteration requiring only half the original sampling steps. This method significantly reduces the number of model sampling steps without compromising on sampling quality. Experimental results demonstrate that MKDDPM, even with a substantially reduced number of diffusion steps, still achieves superior performance, providing a novel solution for single-image SR tasks.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"68 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140840433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0