DisplaysPub Date : 2025-04-29DOI: 10.1016/j.displa.2025.103062
Yunqi Liu , Xue Ouyang , Xiaohui Cui
{"title":"Advanced defense against GAN-based facial manipulation: A multi-domain and multi-dimensional feature fusion approach","authors":"Yunqi Liu , Xue Ouyang , Xiaohui Cui","doi":"10.1016/j.displa.2025.103062","DOIUrl":"10.1016/j.displa.2025.103062","url":null,"abstract":"<div><div>Powerful facial image manipulation offered by encoder-based GAN inversion techniques raises concerns about potential misuse in identity fraud and misinformation. This study introduces the Multi-Domain and Multi-Dimensional Feature Fusion (MDFusion) method, a novel approach that counters encoder-based GAN inversion by generating adversarial samples. Firstly, MDFusion transforms the luminance channel of the target image into spatial, frequency, and spatial-frequency hybrid domains. Secondly, we use the specifically adapted Feature Pyramid Network (FPN) to extract and fuse high-dimensional and low-dimensional features that enhance the robustness of adversarial noise generation. Then, we embed adversarial noise into the spatial-frequency hybrid domain to produce effective adversarial samples. Finally, the adversarial samples are guided by our designed hybrid training loss to achieve a balance between imperceptibility and effectiveness. Tests were conducted on five encoder-based GAN inversion models using ASR, LPIPS, and FID metrics. These tests demonstrated the superiority of MDFusion over 13 baseline methods, highlighting its robust defense and generalization abilities. The implementation code is available at <span><span>https://github.com/LuckAlex/MDFusion</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"89 ","pages":"Article 103062"},"PeriodicalIF":3.7,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143891078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-04-28DOI: 10.1016/j.displa.2025.103068
Kubilay Muhammed Sünnetci
{"title":"Biomedical text-based detection of colon, lung, and thyroid cancer: A deep learning approach with novel dataset","authors":"Kubilay Muhammed Sünnetci","doi":"10.1016/j.displa.2025.103068","DOIUrl":"10.1016/j.displa.2025.103068","url":null,"abstract":"<div><div>Pre-trained Language Models (PLMs) are widely used nowadays and increasingly popular. These models can be used to solve Natural Language Processing (NLP) challenges, and their focus on specific topics allows the models to provide answers to directly relevant issues. As a sub-branch of this, Biomedical Text Classification (BTC) is a fundamental task that can be used in various applications and is used to aid clinical decisions. Therefore, this study detects colon, lung, and thyroid cancer from biomedical texts. A dataset including 3070 biomedical texts is generated by artificial intelligence and used in the study. In this dataset, there are 1020 texts labeled colon cancer, while the number of samples labeled lung and thyroid cancer is equal to 1020 and 1030, respectively. In the study, 70 % of the data is used in the training set, while the remaining data is split for validation and test sets. After preprocessing all the data used in the study, word encoding is used to prepare the model inputs. Furthermore, these documents in the dataset are converted into sequences of numeric indices. Afterward, Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), Bidirectional LSTM (BiLSTM), LSTM+LSTM, GRU+GRU, BiLSTM+BiLSTM, and LSTM+GRU+BiLSTM architectures are trained with train and validation sets, and these models are tested with the test set. Both validation and test performances of all developed models are determined, and a Graphical User Interface (GUI) software is prepared in which the most successful architecture has been embedded. The results show that LSTM is the most successful model, and the accuracy and specificity values achieved by this model in the validation set are equal to 91.32 % and 95.67 %, respectively. The F1 score value achieved by this model for the validation set is also equal to 91.32 %. The accuracy, specificity, and F1 score values achieved by this model in the test set are equal to 85.87 %, 92.94 %, and 85.90 %, respectively. The sensitivity values achieved by this model for the validation and test set are 91.33 % and 85.88 %, respectively. These developed models both provide comparative results and have shown successful performances. Focusing these models on specific issues can provide more effective results for related problems. Furthermore, the presentation of a user-friendly GUI application developed in the study allows users to use the models effectively.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"89 ","pages":"Article 103068"},"PeriodicalIF":3.7,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143895706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-04-26DOI: 10.1016/j.displa.2025.103060
Fan Zhang , Xinhong Zhang
{"title":"A full-reference image quality assessment method based on visual attention and phase consistency","authors":"Fan Zhang , Xinhong Zhang","doi":"10.1016/j.displa.2025.103060","DOIUrl":"10.1016/j.displa.2025.103060","url":null,"abstract":"<div><div>Human Visual System (HVS) focuses more attention on areas of high salience when perceiving image quality. The human eye attaches great importance to the structural change of the image, and the degree of structural change of the image can be reflected by the phase consistency. This paper proposes a full-reference image quality assessment method PCHSIVS based on multi-feature fusion. The phase consistency similarity, visual significance similarity and chrominance similarity are used to generate image quality scores, and the generated image quality scores are weighted by visual significance maps. The experimental results show that the PCHSIVS method is more consistent with the characteristics of human visual perception.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"89 ","pages":"Article 103060"},"PeriodicalIF":3.7,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143887125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-04-24DOI: 10.1016/j.displa.2025.103061
Yuanhao Cai , Chongchong Jin , Yeyao Chen , Ting Luo , Zhouyan He , Gangyi Jiang
{"title":"Blind DIBR-synthesized view quality assessment by integrating local geometry and global structure analysis","authors":"Yuanhao Cai , Chongchong Jin , Yeyao Chen , Ting Luo , Zhouyan He , Gangyi Jiang","doi":"10.1016/j.displa.2025.103061","DOIUrl":"10.1016/j.displa.2025.103061","url":null,"abstract":"<div><div>The realization of free viewpoint videos (FVV) relies heavily on depth-image-based-rendering (DIBR) technology, but the imperfections of DIBR usually lead to local geometric distortions that significantly impact user experience. Therefore, it is crucial to develop a specialized image quality assessment (IQA) model for DIBR-synthesized views. To address this, this paper leverages local geometry and global structure analysis for DIBR-synthesized IQA (LGGS-SIQA). Specifically, in the local geometry-aware feature extraction module, the proposed method introduces an auxiliary task that converts the score learning task into a distortion classification task, aiming to simplify score sample expansion while effectively locating local geometric distortion regions. Based on this, different types of DIBR-synthesized distortions are further detected and weighted to obtain local geometric features. In the global structure-aware feature extraction module, as DIBR-synthesized distortions are mainly concentrated at object edges, the proposed method designs a strategy to extract key structures globally. Statistical analysis of these regions is performed to obtain robust global structural features. Finally, these two types of features are fused and regressed to obtain the final quality score. Experimental results on public benchmark databases show that the proposed LGGS-SIQA method outperforms existing manually extracted-based and deep learning-based IQA methods. Besides, feature ablation experiments validate the effectiveness of the core components of the proposed LGGS-SIQA method.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"89 ","pages":"Article 103061"},"PeriodicalIF":3.7,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143887124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-04-23DOI: 10.1016/j.displa.2025.103058
Jiayue Xu, Chao Xu, Jianping Zhao, Cheng Han, Hua Li
{"title":"Mamba4PASS: Vision Mamba for PAnoramic Semantic Segmentation","authors":"Jiayue Xu, Chao Xu, Jianping Zhao, Cheng Han, Hua Li","doi":"10.1016/j.displa.2025.103058","DOIUrl":"10.1016/j.displa.2025.103058","url":null,"abstract":"<div><div>PAnoramic Semantic Segmentation (PASS) is a significant and challenging task in the field of computer vision, aimed at achieving comprehensive scene understanding through an ultra-wide-angle view. However, the equirectangular projection (ERP) with richer contextual information is susceptible to geometric distortion and spatial discontinuity, which undoubtedly impede the efficacy of PASS. Recently, significant progress has been made in PASS, nevertheless, these methods often face a dilemma between global perception and efficient computation, as well as the effective trade-off between image geometric distortion and spatial discontinuity. To address this, we propose a novel framework for PASS, Mamba4PASS, which is more efficient compared to Transformer-based backbone models. We introduce an Incremental Feature Fusion (IFF) module that gradually integrates semantic features from deeper layers with spatial detail features from shallower layers, effectively alleviating the loss of local details caused by State Space Model (SSM). Additionally, we introduce a Spherical Geometry-Aware Deformable Patch Embedding (SGADPE) module, which leverages spherical geometry properties and employs a novel deformable convolution strategy to adapt to ERPs, effectively addressing spatial discontinuities and stabilizing geometric distortions. To the best of our knowledge, this is the first semantic segmentation model for panoramic images based on the Mamba architecture. We explore the potential of this approach for PASS, providing a new solution to this domain, and validate its effectiveness and advantages. Extensive experiments demonstrate the effectiveness of the proposed method, achieving state-of-the-art results compared to existing approaches.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"89 ","pages":"Article 103058"},"PeriodicalIF":3.7,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143877019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-04-19DOI: 10.1016/j.displa.2025.103056
Hang Zhao , Zitong Wang , Chenyang Li , Rui Zhu , Feiyang Yang
{"title":"DMCMFuse: A dual-phase model via multi-dimensional cross-scanning state space model for multi-modality medical image fusion","authors":"Hang Zhao , Zitong Wang , Chenyang Li , Rui Zhu , Feiyang Yang","doi":"10.1016/j.displa.2025.103056","DOIUrl":"10.1016/j.displa.2025.103056","url":null,"abstract":"<div><div>Multi-modality medical image fusion is crucial for improving diagnostic accuracy by combining complementary information from different imaging modalities. However, a key challenge is effectively balancing the abundant modality-specific features (e.g., soft tissue details in MRI and bone structure in CT) with the relatively fewer modality-shared features, often leading to suboptimal fusion outcomes. To address this, we propose DMCMFuse, a dual-phase model for multi-modality medical image fusion that leverages a multi-dimensional cross-scanning state-space model. The model first decomposes multi-modality images into distinct frequency components to maintain spatial and channel coherence. In the fusion phase, we apply Mamba for the first time in medical image fusion and develop a fusion method that integrates spatial scanning, spatial interaction, and channel scanning. This multi-dimensional cross-scanning approach effectively combines features from each modality, ensuring the retention of both global and local information. Comprehensive experimental results demonstrate that DMCMFuse surpasses the state-of-the-art methods, generating fused images of superior quality with enhanced structure consistency and richer feature representation, making it highly effective for medical image analysis and diagnosis.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"89 ","pages":"Article 103056"},"PeriodicalIF":3.7,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143865128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-04-15DOI: 10.1016/j.displa.2025.103052
Bo Peng , Shidong Xiong , Yangjian Wang , Tingting Zhou , Jinlan Li , Hanguang Xiao , Rong Xiong
{"title":"MI-DETR: A small object detection model for mixed scenes","authors":"Bo Peng , Shidong Xiong , Yangjian Wang , Tingting Zhou , Jinlan Li , Hanguang Xiao , Rong Xiong","doi":"10.1016/j.displa.2025.103052","DOIUrl":"10.1016/j.displa.2025.103052","url":null,"abstract":"<div><div>Object detection is a fundamental task in computer vision with many applications. However, the complexity of real-world scenarios and the small size of objects present significant challenges to achieving rapid and accurate detection in mixed environments. To address these challenges, we propose a novel mixed scene-oriented small object detection model (MI-DETR) designed to enhance detection accuracy and efficiency. The backbone network of MI-DETR incorporates the Fast Fourier Transform, Channel Shuffle, and Orthogonal Attention Mechanism to improve feature extraction while reducing computational costs significantly. Additionally, we introduce a specialized small object feature layer and a Multi-Scale Feature Fusion (MSFF) module to strengthen the model’s feature fusion capabilities. Furthermore, we propose a novel loss function, Focaler-WIoU, which prioritizes high-quality anchor frames to enhance the detector’s performance. We validate the effectiveness of our model through experiments on small object detection datasets across three complex scenarios. The results show that the proposed MI-DETR model has 40% fewer parameters and 5% less computational effort than the former. On these three datasets, MI-DETR achieves accuracies of 70.2%, 34.5%, and 34.1%, respectively. It also achieves small object detection accuracies of 19.8%, 11.5%, and 12.6%, respectively. Additionally, its latency decreases by 0.9 ms, 1.0 ms, and 1.1 ms, respectively, outperforming other real-time detection models of similar size.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"88 ","pages":"Article 103052"},"PeriodicalIF":3.7,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143828928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-04-12DOI: 10.1016/j.displa.2025.103042
Dongdong Sun , Chuanyun Wang , Tian Wang , Qian Gao , Qiong Liu , Linlin Wang
{"title":"CLIPFusion: Infrared and visible image fusion network based on image–text large model and adaptive learning","authors":"Dongdong Sun , Chuanyun Wang , Tian Wang , Qian Gao , Qiong Liu , Linlin Wang","doi":"10.1016/j.displa.2025.103042","DOIUrl":"10.1016/j.displa.2025.103042","url":null,"abstract":"<div><div>The goal of infrared and visible image fusion is to integrate complementary multimodal images to produce highly informative and visually effective fused images, which have a wide range of applications in automated driving, fault diagnosis and night vision. Since the infrared and visible image fusion task usually does not have real labels as a reference, the design of the loss function is highly influenced by human subjectivity, which limits the performance of the model. To address the issue of insufficient real labels, this paper designs a prompt generation network based on the image–text large model, which learns text prompts for different types of images by restricting the distances between unimodal image prompts and fused image prompts to the corresponding images in the potential space of the image–text large model. The learned prompt texts are then used as labels for fused image generation by constraining the distance between the fused image and the different prompt texts in the latent space of the large image–text model. To further improve the quality of the fused images, this paper uses the fused images generated with different iterations to adaptively fine-tune the prompt generation network to continuously improve the quality of the generated prompt text labels and indirectly improve the visual effect of the fused images. In addition, to minimise the influence of subjective information in the fused image generation process, a 3D convolution-based fused image generation network is proposed to achieve the integration of infrared and visible feature through adaptive learning in additional dimensions. Extensive experiments show that the proposed model exhibits good visual effects and quantitative metrics in infrared–visible image fusion tasks in military scenarios, autopilot scenarios and dark-light scenarios, as well as good generalisation ability in multi-focus image fusion and medical image fusion tasks.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"89 ","pages":"Article 103042"},"PeriodicalIF":3.7,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143855828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-04-11DOI: 10.1016/j.displa.2025.103046
Lei Wang, Qingbo Wu, Desen Yuan, Fanman Meng, Zhengning Wang, King Ngi Ngan
{"title":"Causal perception inspired representation learning for trustworthy image quality assessment","authors":"Lei Wang, Qingbo Wu, Desen Yuan, Fanman Meng, Zhengning Wang, King Ngi Ngan","doi":"10.1016/j.displa.2025.103046","DOIUrl":"10.1016/j.displa.2025.103046","url":null,"abstract":"<div><div>Despite great success in modeling visual perception, deep neural network based image quality assessment (IQA) still remains untrustworthy in real-world applications due to its vulnerability to adversarial perturbations. In this paper, we propose to build a trustworthy IQA model via Causal Perception inspired Representation Learning (CPRL). More specifically, we assume that each image is composed of Causal Perception Representation (CPR) and non-causal perception representation (N-CPR). CPR serves as the causation of the subjective quality label, which is invariant to the imperceptible adversarial perturbations. Inversely, N-CPR presents spurious associations with the subjective quality label, which may significantly change with the adversarial perturbations. We propose causal intervention to boost CPR and eliminate N-CPR. Specifically, we first generate a series of N-CPR intervention images, and then minimize the causal invariance loss. Then we propose a SortMask module to reduce Lipschitz and improve robustness. SortMask block small changes around the mean to eliminate N-CPR and can be plug-and-play. Experiments on four benchmark databases show that the proposed CPRL method outperforms many state-of-the-art methods and provides explicit model interpretation. To support reproducible scientific research, we release the code at <span><span>https://clearlovewl.github.io</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"88 ","pages":"Article 103046"},"PeriodicalIF":3.7,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143828927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-04-09DOI: 10.1016/j.displa.2025.103053
Li Yu , Jiafu Zhang , Ke Chen , Moncef Gabbouj
{"title":"Point cloud upsampling via implicit shape priors discovery and refinement","authors":"Li Yu , Jiafu Zhang , Ke Chen , Moncef Gabbouj","doi":"10.1016/j.displa.2025.103053","DOIUrl":"10.1016/j.displa.2025.103053","url":null,"abstract":"<div><div>The point clouds obtained by scanning sensors are often sparse and non-uniform, therefore, point cloud upsampling is of vital importance. This paper considers geometric priors as a rich source to guide point cloud generation for the better qualities. However, it is less flexible to explicitly exploit geometric priors of object surface, such as local geometric smoothness and fairness. In light of this, this paper proposes a novel two-stage method via discovering and exploiting implicit shape priors, which can consist of coarse point cloud upsampling and fine details refining. Specifically, at the first stage, we explore to discover geometric priors in an implicit manner via Dual Transformer, which simultaneously addressing local and global information during feature encoding, while a Neighborhood Refinement module is proposed to handle with geometric irregularities and noises via exploiting feature similarity of neighboring points. Extensive experiments on synthetic and real datasets validate our motivation, demonstrating that our method achieves competitive performance compared to SOTA methods, and better results for noisy point clouds. The source code of this work is available at <span><span>https://github.com/Vencoders/PU-DT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"88 ","pages":"Article 103053"},"PeriodicalIF":3.7,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143808298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}