{"title":"Light field salient object detection network based on feature enhancement and mutual attention","authors":"Xi Zhu, Huai Xia, Xucheng Wang, Zhenrong Zheng","doi":"10.1117/1.jei.33.5.053001","DOIUrl":"https://doi.org/10.1117/1.jei.33.5.053001","url":null,"abstract":"Light field salient object detection (SOD) is an essential research topic in computer vision, but robust saliency detection in complex scenes is still very challenging. We propose a new method for accurate and robust light field SOD via convolutional neural networks containing feature enhancement modules. First, the light field dataset is extended by geometric transformations such as stretching, cropping, flipping, and rotating. Next, two feature enhancement modules are designed to extract features from RGB images and depth maps, respectively. The obtained feature maps are fed into a two-stream network to train the light field SOD. We propose a mutual attention approach in this process, extracting and fusing features from RGB images and depth maps. Therefore, our network can generate an accurate saliency map from the input light field images after training. The obtained saliency map can provide reliable a priori information for tasks such as semantic segmentation, target recognition, and visual tracking. Experimental results show that the proposed method achieves excellent detection performance in public benchmark datasets and outperforms the state-of-the-art methods. We also verify the generalization and stability of the method in real-world experiments.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"8 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Small space target detection using megapixel resolution CeleX-V camera","authors":"Yuanyuan Lv, Liang Zhou, Zhaohui Liu, Wenlong Qiao, Haiyang Zhang","doi":"10.1117/1.jei.33.5.053002","DOIUrl":"https://doi.org/10.1117/1.jei.33.5.053002","url":null,"abstract":"An event camera (EC) is a bioinspired vision sensor with the advantages of a high temporal resolution, high dynamic range, and low latency. Due to the inherent sparsity of space target imaging data, EC becomes an ideal imaging sensor for space target detection. In this work, we conduct detection of small space targets using a CeleX-V camera with a megapixel resolution. We propose a target detection method based on field segmentation, utilizing the event output characteristics of an EC. This method enables real-time monitoring of the spatial positions of space targets within the camera’s field of view. The effectiveness of this approach is validated through experiments involving real-world observations of space targets. Using the proposed method, real-time observation of space targets with a megapixel resolution EC becomes feasible, demonstrating substantial practical potential in the field of space target detection.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"3 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video anomaly detection based on frame memory bank and decoupled asymmetric convolutions","authors":"Min Zhao, Chuanxu Wang, Jiajiong Li, Zitai Jiang","doi":"10.1117/1.jei.33.5.053006","DOIUrl":"https://doi.org/10.1117/1.jei.33.5.053006","url":null,"abstract":"Video anomaly detection (VAD) is essential for monitoring systems. The prediction-based methods identify anomalies by comparing differences between the predicted and real frames. We propose an unsupervised VAD method based on frame memory bank (FMB) and decoupled asymmetric convolution (DAConv), which addresses three problems encountered with auto-encoders (AE) in VAD: (1) how to mitigate the noise resulting from jittering between frames, which is ignored; (2) how to alleviate the insufficient utilization of temporal information by traditional two-dimensional (2D) convolution and the burden for more computing resources in three-dimensional (3D) convolution; and (3) how to make full use of normal data to improve the reliability of anomaly discrimination. Specifically, we initially design a separate network to calibrate video frames within the dataset. Second, we design DAConv to extract features from the video, addressing the absence of temporal dimension information in 2D convolutions and the high computational complexity of 3D convolutions. Concurrently, the interval-frame mechanism mitigates the problem of information redundancy caused by data reuse. Finally, we embed an FMB to store features of normal events, amplifying the contrast between normal and abnormal frames. We conduct extensive experiments on the UCSD Ped2, CUHK Avenue, and ShanghaiTech datasets, achieving AUC values of 98.7%, 90.4%, and 74.8%, respectively, which fully demonstrates the rationality and effectiveness of the proposed method.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"105 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generative object separation in X-ray images","authors":"Xiaolong Zheng, Yu Zhou, Jia Yao, Liang Zheng","doi":"10.1117/1.jei.33.5.053004","DOIUrl":"https://doi.org/10.1117/1.jei.33.5.053004","url":null,"abstract":"X-ray imaging is essential for security inspection; nevertheless, the penetrability of X-rays can cause objects within a package to overlap in X-ray images, leading to reduced accuracy in manual inspection and increased difficulty in auxiliary inspection techniques. Existing methods mainly focus on object detection to enhance the detection ability of models for overlapping regions by augmenting image features, including color, texture, and semantic information. However, these approaches do not address the underlying issue of overlap. We propose a novel method for separating overlapping objects in X-ray images from the perspective of image inpainting. Specifically, the separation method involves using a vision transformer (ViT) to construct a generative adversarial network (GAN) model that requires a hand-created trimap as input. In addition, we present an end-to-end approach that integrates Mask Region-based Convolutional Neural Network with the separation network to achieve fully automated separation of overlapping objects. Given the lack of datasets appropriate for training separation networks, we created MaskXray, a collection of X-ray images that includes overlapping images, trimap, and individual object images. Our proposed generative separation network was tested in experiments and demonstrated its ability to accurately separate overlapping objects in X-ray images. These results demonstrate the efficacy of our approach and make significant contributions to the field of X-ray image analysis.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"4 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vis-YOLO: a lightweight and efficient image detector for unmanned aerial vehicle small objects","authors":"Xiangyu Deng, Jiangyong Du","doi":"10.1117/1.jei.33.5.053003","DOIUrl":"https://doi.org/10.1117/1.jei.33.5.053003","url":null,"abstract":"Yolo series models are extensive within the domain of object detection. Aiming at the challenge of small object detection, we analyze the limitations of existing detection models and propose a Vis-YOLO object detection algorithm based on YOLOv8s. First, the down-sampling times are reduced to retain more features, and the detection head is replaced to adapt to the small object. Then, deformable convolutional networks are used to improve the C2f module, improving its feature extraction ability. Finally, the separation and enhancement attention module is introduced to the model to give more weight to the useful information. Experiments show that the improved Vis-YOLO model outperforms the YOLOv8s model on the visdrone-2019 dataset. The precision improved by 5.4%, the recall by 6.3%, and the mAP50 by 6.8%. Moreover, Vis-YOLO models are smaller and suitable for mobile deployment. This research provides a new method and idea for small object detection, which has excellent potential application value.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"37 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Double-level deep multi-view collaborative learning for image clustering","authors":"Liang Xiao, Wenzhe Liu","doi":"10.1117/1.jei.33.5.053012","DOIUrl":"https://doi.org/10.1117/1.jei.33.5.053012","url":null,"abstract":"Multi-view clustering has garnered significant attention due to its ability to explore shared information from multiple views. Applications of multi-view clustering include image and video analysis, bioinformatics, and social network analysis, in which integrating diverse data sources enhances data understanding and insights. However, existing multi-view models suffer from the following limitations: (1) directly extracting latent representations from raw data using encoders is susceptible to interference from noise and other factors and (2) complementary information among different views is often overlooked, resulting in the loss of crucial unique information from each view. Therefore, we propose a distinctive double-level deep multi-view collaborative learning approach. Our method further processes the latent representations learned by the encoder through multiple layers of perceptrons to obtain richer semantic information. In addition, we introduce dual-path guidance at both the feature and label levels to facilitate the learning of complementary information across different views. Furthermore, we introduce pre-clustering methods to guide mutual learning among different views through pseudo-labels. Experimental results on four image datasets (Caltech-5V, STL10, Cifar10, Cifar100) demonstrate that our method achieves state-of-the-art clustering performance, evaluated using standard metrics, including accuracy, normalized mutual information, and purity. We compare our proposed method with existing clustering algorithms to validate its effectiveness.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"6 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"USDAP: universal source-free domain adaptation based on prompt learning","authors":"Xun Shao, Mingwen Shao, Sijie Chen, Yuanyuan Liu","doi":"10.1117/1.jei.33.5.053015","DOIUrl":"https://doi.org/10.1117/1.jei.33.5.053015","url":null,"abstract":"Universal source-free domain adaptation (USFDA) aims to explore transferring domain-consistent knowledge in the presence of domain shift and category shift, without access to a source domain. Existing works mainly rely on prior domain-invariant knowledge provided by the source model, ignoring the significant discrepancy between the source and target domains. However, directly utilizing the source model will generate noisy pseudo-labels on the target domain, resulting in erroneous decision boundaries. To alleviate the aforementioned issue, we propose a two-stage USFDA approach based on prompt learning, named USDAP. Primarily, to reduce domain differences, during the prompt learning stage, we introduce a learnable prompt designed to align the target domain distribution with the source. Furthermore, for more discriminative decision boundaries, in the feature alignment stage, we propose an adaptive global-local clustering strategy. This strategy utilizes one-versus-all clustering globally to separate different categories and neighbor-to-neighbor clustering locally to prevent incorrect pseudo-label assignments at cluster boundaries. Based on the above two-stage method, target data are adapted to the classification network under the prompt’s guidance, forming more compact category clusters, thus achieving excellent migration performance for the model. We conduct experiments on various datasets with diverse category shift scenarios to illustrate the superiority of our USDAP.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"105 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DTSIDNet: a discrete wavelet and transformer based network for single image denoising","authors":"Cong Hu, Yang Qu, Yuan-Bo Li, Xiao-Jun Wu","doi":"10.1117/1.jei.33.5.053007","DOIUrl":"https://doi.org/10.1117/1.jei.33.5.053007","url":null,"abstract":"Recent advancements in transformer architectures have significantly enhanced image-denoising algorithms, surpassing the limitations of traditional convolutional neural networks by more effectively modeling global interactions through advanced attention mechanisms. In the domain of single-image denoising, noise manifests across various scales. This is especially evident in intricate scenarios, necessitating the comprehensive capture of multi-scale information inherent in the image. To solve transformer’s lack of multi-scale image analysis capability, a discrete wavelet and transformer based network (DTSIDNet) is proposed. The network adeptly resolves the inherent limitations of the transformer architecture by integrating the discrete wavelet transform. DTSIDNet independently manages image data at various scales, which greatly improves both adaptability and efficiency in environments with complex noise. The network’s self-attention mechanism dynamically shifts focus among different scales, efficiently capturing an extensive array of image features, thereby significantly enhancing the denoising outcome. This approach not only boosts the precision of denoising but also enhances the utilization of computational resources, striking an optimal balance between efficiency and high performance. Experiments on real-world and synthetic noise scenarios show that DTSIDNet delivers high image quality with low computational demands, indicating its superior performance in denoising tasks with efficient resource use.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"37 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward effective local dimming-driven liquid crystal displays: a deep curve estimation–based adaptive compensation solution","authors":"Tianshan Liu, Kin-Man Lam","doi":"10.1117/1.jei.33.5.053005","DOIUrl":"https://doi.org/10.1117/1.jei.33.5.053005","url":null,"abstract":"Local backlight dimming (LBD) is a promising technique for improving the contrast ratio and saving power consumption for liquid crystal displays. LBD consists of two crucial parts, i.e., backlight luminance determination, which locally controls the luminance of each sub-block of the backlight unit (BLU), and pixel compensation, which compensates for the reduction of pixel intensity, to achieve pleasing visual quality. However, the limitations of the current deep learning–based pixel compensation methods come from two aspects. First, it is difficult for a vanilla image-to-image translation strategy to learn the mapping relations between the input image and the compensated image, especially without considering the dimming levels. Second, the extensive model parameters make these methods hard to be deployed in industrial applications. To address these issues, we reformulate pixel compensation as an input-specific curve estimation task. Specifically, a deep lightweight network, namely, the curve estimation network (CENet), takes both the original input image and the dimmed BLUs as input, to estimate a set of high-order curves. Then, these curves are applied iteratively to adjust the intensity of each pixel to obtain a compensated image. Given the determined BLUs, the proposed CENet can be trained in an end-to-end manner. This implies that our proposed method is compatible with any backlight dimming strategies. Extensive evaluation results on the DIVerse 2K (DIV2K) dataset highlight the superiority of the proposed CENet-integrated local dimming framework, in terms of model size and visual quality of displayed content.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"20 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SMLoc: spatial multilayer perception-guided camera localization","authors":"Jingyuan Feng, Shengsheng Wang, Haonan Sun","doi":"10.1117/1.jei.33.5.053013","DOIUrl":"https://doi.org/10.1117/1.jei.33.5.053013","url":null,"abstract":"Camera localization is a technique for obtaining the camera’s six degrees of freedom using the camera as a sensor input. It is widely used in augmented reality, autonomous driving, virtual reality, etc. In recent years, with the development of deep-learning technology, absolute pose regression has gained wide attention as an end-to-end learning-based localization method. The typical architecture is constructed by a convolutional backbone and a multilayer perception (MLP) regression header composed of multiple fully connected layers. Typically, the two-dimensional feature maps extracted by the convolutional backbone have to be flattened and passed into the fully connected layer for pose regression. However, this operation will result in the loss of crucial pixel position information carried by the two-dimensional feature map and adversely affect the accuracy of the pose estimation. We propose a parallel structure, termed SMLoc, using a spatial MLP to aggregate position and orientation information from feature maps, respectively, reducing the loss of pixel position information. Our approach achieves superior performance on common indoor and outdoor datasets.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"734 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}