Image and Vision Computing最新文献

筛选
英文 中文
A human layout consistency framework for image-based virtual try-on 基于图像的虚拟试戴的人工布局一致性框架
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-01-01 Epub Date: 2025-11-19 DOI: 10.1016/j.imavis.2025.105831
Rong Huang , Zhicheng Wang , Hao Liu , Aihua Dong
{"title":"A human layout consistency framework for image-based virtual try-on","authors":"Rong Huang ,&nbsp;Zhicheng Wang ,&nbsp;Hao Liu ,&nbsp;Aihua Dong","doi":"10.1016/j.imavis.2025.105831","DOIUrl":"10.1016/j.imavis.2025.105831","url":null,"abstract":"<div><div>Image-based virtual try-on, commonly framed as a generative image-to-image translation task, has garnered significant research interest due to its elimination of the need for costly 3D scanning devices. In this field, image inpainting and cycle-consistency have been the dominant frameworks, but they still face challenges in cross-attribute adaptation and parameter sharing between try-on networks. This paper proposes a new framework, termed human layout consistency, based on the intuitive insight that a high-quality try-on result should align with a coherent human layout. Under the proposed framework, a try-on network is equipped with an upstream Human Layout Generator (HLG) and a downstream Human Layout Parser (HLP). The former generates an expected human layout as if the person were wearing the selected target garment, while the latter extracts an actual human layout parsed from the try-on result. The supervisory signals, free from the ground-truth image pairs, are constructed by assessing the consistencies between the expected and actual human layouts. We design a dual-phase training strategy, first warming up HLG and HLP, then training try-on network by incorporating the supervisory signals based on human layout consistency. On this basis, the proposed framework enables arbitrary selection of target garments during training, thereby endowing the try-on network with the cross-attribute adaptation. Moreover, the proposed framework operates with a single try-on network, rather than two physically separate ones, thereby avoiding the parameter-sharing issue. We conducted both qualitative and quantitative experiments on the benchmark VITON dataset. Experimental results demonstrate that our proposal can generate high-quality try-on results, outperforming baselines by a margin of 0.75% to 10.58%. Ablation and visualization results further reveal that the proposed method exhibits superior adaptability to cross-attribute translations, showcasing its potential for practical application.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"165 ","pages":"Article 105831"},"PeriodicalIF":4.2,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advanced fusion of IoT and AI technologies for smart environments: Enhancing environmental perception and mobility solutions for visually impaired individuals 智能环境中物联网和人工智能技术的先进融合:增强视障人士的环境感知和移动解决方案
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-01-01 Epub Date: 2025-11-19 DOI: 10.1016/j.imavis.2025.105827
Nouf Nawar Alotaibi , Mrim M. Alnfiai , Mona Mohammed Alnahari , Salma Mohsen M. Alnefaie , Faiz Abdullah Alotaibi
{"title":"Advanced fusion of IoT and AI technologies for smart environments: Enhancing environmental perception and mobility solutions for visually impaired individuals","authors":"Nouf Nawar Alotaibi ,&nbsp;Mrim M. Alnfiai ,&nbsp;Mona Mohammed Alnahari ,&nbsp;Salma Mohsen M. Alnefaie ,&nbsp;Faiz Abdullah Alotaibi","doi":"10.1016/j.imavis.2025.105827","DOIUrl":"10.1016/j.imavis.2025.105827","url":null,"abstract":"<div><h3>Objective</h3><div>To develop a robust proposed model that integrates multiple sensor modalities to enhance environmental perception and mobility for visually impaired individuals, improving their autonomy and safety in both indoor and outdoor settings.</div></div><div><h3>Methods</h3><div>The proposed system utilizes advanced IoT and AI technologies, integrating data from proximity, ambient light, and motion sensors through recursive Bayesian filtering, kernel-based fusion algorithms, and probabilistic graphical models. A comprehensive dataset was collected across diverse environments to train and evaluate the model's accuracy in real-time environmental context estimation and motion activity detection. This study employed a multidisciplinary approach, integrating the Internet of Things (IoT) and Artificial Intelligence (AI), to develop a proposed model for assisting visually impaired individuals. The study was conducted over six months (April 2024 to September 2024) in Saudi Arabia, utilizing resources from Najran University. Data collection involved deploying IoT devices across various indoor and outdoor environments, including residential areas, commercial spaces, and urban streets, to ensure diversity and real-world applicability. The system utilized proximity sensors, ambient light sensors, and motion detectors to gather data under different lighting, weather, and dynamic conditions. Recursive Bayesian filtering, kernel-based fusion algorithms, and probabilistic graphical models were employed to process the sensor inputs and provide real-time environmental context and motion detection. The study followed a rigorous training and validation process using the collected dataset, ensuring reliability and scalability across diverse scenarios. Ethical considerations were adhered to throughout the project, with no direct interaction with human subjects.</div></div><div><h3>Results</h3><div>The Proposed model demonstrated an accuracy of 85% in predicting environmental context and 82% in motion detection, achieving precision and F1-scores of 88% and 85%, respectively. Real-time implementation provided reliable, dynamic feedback on environmental changes and motion activities, significantly enhancing situational awareness.</div></div><div><h3>Conclusion</h3><div>The Proposed model effectively combines sensor data to deliver real-time, context-aware assistance for visually impaired individuals, improving their ability to navigate complex environments. The system offers a significant advancement in assistive technology and holds promise for broader applications with further enhancements.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"165 ","pages":"Article 105827"},"PeriodicalIF":4.2,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LET-CViT: A low-light enhanced two-stream CNN and vision transformer for Deepfake detection LET-CViT:一种用于Deepfake检测的弱光增强双流CNN和视觉变压器
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-01-01 Epub Date: 2025-11-20 DOI: 10.1016/j.imavis.2025.105828
Gaoming Yang , Yifan Song , Xiangyu Yang , Ji Zhang
{"title":"LET-CViT: A low-light enhanced two-stream CNN and vision transformer for Deepfake detection","authors":"Gaoming Yang ,&nbsp;Yifan Song ,&nbsp;Xiangyu Yang ,&nbsp;Ji Zhang","doi":"10.1016/j.imavis.2025.105828","DOIUrl":"10.1016/j.imavis.2025.105828","url":null,"abstract":"<div><div>With the development of generative technologies, fake faces have become increasingly realistic. Unknown forgery methods and complex generation environments make Deepfake detection challenging. While existing detectors can identify most forged images under normal lighting conditions, their performance deteriorates in different lighting environments, especially under low-light conditions. In this paper, to address the challenges of forged face detection performance in low-light environments, we present a novel Low-light Enhanced Two-stream CNN and Vision Transformer (LET-CViT) framework, which contains our improved ReLU-CBAM Depthwise Separable Convolution (RC-DSC) block and Dynamic Sigmoid-Gated Multi-Head Attention (DSG-MHA) block. At the same time, the LET-CViT incorporates two innovative modules, namely Low-light Enhancement with Denoising (LED) and Wavelet Transform high-frequency Fusion (WTF). Specifically, the premier LED module is capable of improving low-light image quality and capturing fake textures with light enhancement technology and directional denoising. Subsequently, the proposed WTF module captures multi-scale features and focuses on high-frequency information by multiple fusions of high-frequency sub-bands after discrete wavelet transformation, while reducing the interference of low-frequency information. Extensive experiments on several datasets show that our framework is able to reliably detect forged videos under low-light conditions. The AUCs for the unseen DeeperForensics-1.0 and DFD datasets reach 95.73% and 95.24% respectively, significantly outperforming other mainstream models. The code for reproducing our results is publicly available here: <span><span>https://github.com/SYF-code/LET-CViT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"165 ","pages":"Article 105828"},"PeriodicalIF":4.2,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145624272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LSBE-Net: Semantic segmentation of large-scale point cloud scenes via local boundary feature and spatial attention aggregation LSBE-Net:基于局部边界特征和空间注意力聚合的大规模点云场景语义分割
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-01-01 Epub Date: 2025-11-03 DOI: 10.1016/j.imavis.2025.105798
Hailang Wang, Keke Duan, Mingzi Zhang, Li Ma
{"title":"LSBE-Net: Semantic segmentation of large-scale point cloud scenes via local boundary feature and spatial attention aggregation","authors":"Hailang Wang,&nbsp;Keke Duan,&nbsp;Mingzi Zhang,&nbsp;Li Ma","doi":"10.1016/j.imavis.2025.105798","DOIUrl":"10.1016/j.imavis.2025.105798","url":null,"abstract":"<div><div>3D point cloud semantic segmentation plays a pivotal role in comprehending 3D scenes and facilitating environmental perception. Existing studies predominantly emphasize the extraction of local geometric structures, but they often overlook the incorporation of local boundary cues and long-range spatial relationships. This limitation hampers precise delineation of object boundaries and impairs the distinction of long distance instances. To address these challenges, we propose LSBE-Net, a novel segmentation algorithm designed to extract local boundary features and integrate spatial context features. The Local Surface Representation (LSR) module is introduced to capture local geometric shapes by encoding both surface and positional features, thereby providing critical structural information. The Local Boundary Enhancement (LBE) module extracts boundary features and fuses them with geometric and semantic features through a transformer mechanism within local neighborhoods, enabling the learning of contextual relationships and refinement of boundary delineation. These features are aggregated through the Spatial Encoding Attention (SEA) module, which facilitates the learning of long-range dependencies and spatial relationship across the point cloud. The proposed LSBE-Net is extensively evaluated on three large-scale benchmark datasets: S3DIS, Toronto3D, and Semantic3D. Our method achieves competitive mean Intersection over Union (mIoU) scores of 66.1%, 82.3%, and 78.0%, respectively, demonstrating its effectiveness and robustness in diverse real-world scenarios.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"165 ","pages":"Article 105798"},"PeriodicalIF":4.2,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145468939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
W-MambaFuse: A wavelet decomposition and adaptive state-space modeling approach for anatomical and functional image fusion W-MambaFuse:一种用于解剖和功能图像融合的小波分解和自适应状态空间建模方法
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-01-01 Epub Date: 2025-11-08 DOI: 10.1016/j.imavis.2025.105796
Bowen Zhong , Shijie Li , Xuan Deng , Zheng Li
{"title":"W-MambaFuse: A wavelet decomposition and adaptive state-space modeling approach for anatomical and functional image fusion","authors":"Bowen Zhong ,&nbsp;Shijie Li ,&nbsp;Xuan Deng ,&nbsp;Zheng Li","doi":"10.1016/j.imavis.2025.105796","DOIUrl":"10.1016/j.imavis.2025.105796","url":null,"abstract":"<div><div>Anatomical-functional image fusion plays a critical role in a variety of medical and biological applications. Current convolutional neural network-based fusion algorithms are constrained by their limited receptive fields, impeding the effective modeling of long-range dependencies in medical images. While transformer-based architectures possess global modeling capabilities, they face computational challenges due to the quadratic complexity of their self-attention mechanisms. To address these limitations, we propose a network based on wavelet-domain decomposition and an adaptive selectively structured state space model, termed as W-MambaFuse, for anatomical and functional image fusion. Specifically, the network first applies a wavelet transform to enlarge the receptive field of the convolutional layers, facilitating the capture of low-frequency structural outlines and high-frequency textural primitives. Furthermore, we develop an adaptive gated fusion module, referred to as CNN-Mamba Gated (MCG), which leverages the dynamic modeling capability of state space models and the local feature extraction strengths of convolutional neural networks. This design facilitates the effective extraction of both intra-modal and inter-modal features, thereby enhancing multimodal image fusion. Experimental results on benchmark datasets demonstrate that W-MambaFuse consistently outperforms pure CNN-based models, transformer-based models, and CNN-transformer hybrid approaches in terms of both visual quality and quantitative evaluations. Our code is publicly available at <span><span>https://github.com/Bowen-Zhong/W-Mamba</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"165 ","pages":"Article 105796"},"PeriodicalIF":4.2,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145521232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CI-TransCNN: A class imbalance hybrid CNN-Transformer Network for facial attribute recognition 一类不平衡混合CNN-Transformer网络用于人脸属性识别
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-01-01 Epub Date: 2025-11-10 DOI: 10.1016/j.imavis.2025.105823
Yanfei Liu , Youchang Shi , Yufei Long , Miaosen Xu , Junhua Chen , Yuanqian Li , Hao Wen
{"title":"CI-TransCNN: A class imbalance hybrid CNN-Transformer Network for facial attribute recognition","authors":"Yanfei Liu ,&nbsp;Youchang Shi ,&nbsp;Yufei Long ,&nbsp;Miaosen Xu ,&nbsp;Junhua Chen ,&nbsp;Yuanqian Li ,&nbsp;Hao Wen","doi":"10.1016/j.imavis.2025.105823","DOIUrl":"10.1016/j.imavis.2025.105823","url":null,"abstract":"<div><div>Recent facial attribute recognition (FAR) methods often struggle to capture global dependencies and are further challenged by severe class imbalance, large intra-class variations, and high inter-class similarity, ultimately limiting their overall performance. To address these challenges, we propose a network combining CNN and Transformer, termed Class Imbalance Transformer-CNN (CI-TransCNN), for facial attribute recognition, which mainly consists of a TransCNN backbone and a Dual Attention Feature Fusion (DAFF) module. In TransCNN, we incorporate a Structure Self-Attention (StructSA) to improve the utilization of structural patterns in images and propose an Inverted Residual Convolutional GLU (IRC-GLU) to enhance model robustness. This design enables TransCNN to effectively capture multi-level and multi-scale features while integrating both global and local information. DAFF is presented to fuse the features extracted from TransCNN to further improve the feature’s discriminability by using spatial attention and channel attention. Moreover, a Class-Imbalance Binary Cross-Entropy (CIBCE) loss is proposed to improve the model performance on datasets with class imbalance, large intra-class variation, and high inter-class similarity. Experimental results on the CelebA and LFWA datasets show that our method effectively addresses issues such as class imbalance and achieves superior performance compared to existing state-of-the-art CNN- and Transformer-based FAR approaches.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"165 ","pages":"Article 105823"},"PeriodicalIF":4.2,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic-assisted unpaired image dehazing 语义辅助非配对图像去雾
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-01-01 Epub Date: 2025-11-06 DOI: 10.1016/j.imavis.2025.105818
Yang Yang, Lei Zhang, Ke Pang, Tongtong Chen, Xiaodong Yue
{"title":"Semantic-assisted unpaired image dehazing","authors":"Yang Yang,&nbsp;Lei Zhang,&nbsp;Ke Pang,&nbsp;Tongtong Chen,&nbsp;Xiaodong Yue","doi":"10.1016/j.imavis.2025.105818","DOIUrl":"10.1016/j.imavis.2025.105818","url":null,"abstract":"<div><div>Recently, a series of innovative unpaired image dehazing techniques have been introduced, they have relieved pressure from collecting paired data, yet these methods typically overlook the integration of semantic information, which is essential for a more comprehensive dehazing process. Our research aims to bridge this gap by proposing a novel method that fully integrates feature information into unpaired image dehazing. Specifically, we propose a semantic information-guided feature enhancement and fusion block, which selectively fuses the refined features guided by the semantic result layer and semantic feature layer based on the uncertainty of semantic information. Besides, our method adopts semantic information to guide the generation of haze in the training process. This approach results in the creation of a more diverse set of hazy images, which in turn enhances the dehazing performance. Furthermore, in terms of the loss function, we introduce a loss term that constrains the semantic information entropy of the dehazing results. This constraint ensures that the dehazed images not only achieve clarity but also retain semantic accuracy and integrity. Extensive experiments validate our superiority over other methods and the effectiveness of our designs. The code is available at .</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"165 ","pages":"Article 105818"},"PeriodicalIF":4.2,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145528664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Single stage weakly supervised semantic segmentation via enhanced patch affinity 基于增强贴片亲和力的单阶段弱监督语义分割
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-12-01 Epub Date: 2025-10-15 DOI: 10.1016/j.imavis.2025.105791
Jingjie Jiang , Yuhui Zheng , Guoqing Zhang
{"title":"Single stage weakly supervised semantic segmentation via enhanced patch affinity","authors":"Jingjie Jiang ,&nbsp;Yuhui Zheng ,&nbsp;Guoqing Zhang","doi":"10.1016/j.imavis.2025.105791","DOIUrl":"10.1016/j.imavis.2025.105791","url":null,"abstract":"<div><div>Weakly supervised semantic segmentation (WSSS) with image-level labels typically employs class activation maps (CAMs) to generate pseudo-labels. Existing WSSS methods, whether based on CNN or Transformer frameworks, predominantly adopt multi-stage pipelines that entail stage-wise training and disparate strategies, resulting in complex inter-stage interactions. Furthermore, prior approaches frequently optimize CAMs directly via patch affinity in Vision Transformer (ViT), a computationally intensive process and may lead to excessive background activation and blurred object boundaries. To address these limitations, we propose a single-stage WSSS method called SSEPA (Single Stage WSSS with Enhanced Patch Affinity), which integrates end-to-end optimization of initial CAMs by patch affinity. To further enhance patch affinity in attention maps, we propose the Adaptive Layer Attention Fusion (ALAF) module. ALAF assesses the importance of attention from different depth layers by assigning weights and fusing them through dynamic weight vectors. Experiments on the PASCAL VOC and MS COCO datasets show that our method can significantly improve the quality of CAM and segmentation models. Compared to previous single-stage methods, SSEPA exhibits lower misclassification probability and produces more precise object boundaries, fully verifying the effectiveness of our approach.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"164 ","pages":"Article 105791"},"PeriodicalIF":4.2,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145419296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simultaneous acquisition of geometry and material for translucent objects 同时获取半透明物体的几何形状和材料
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-12-01 Epub Date: 2025-10-24 DOI: 10.1016/j.imavis.2025.105793
Chenhao Li , Trung Thanh Ngo , Hajime Nagahara
{"title":"Simultaneous acquisition of geometry and material for translucent objects","authors":"Chenhao Li ,&nbsp;Trung Thanh Ngo ,&nbsp;Hajime Nagahara","doi":"10.1016/j.imavis.2025.105793","DOIUrl":"10.1016/j.imavis.2025.105793","url":null,"abstract":"<div><div>Reconstructing the geometry and material properties of translucent objects from images is a challenging problem due to the complex light propagation of translucent media and the inherent ambiguity of inverse rendering. Therefore, previous works often make the assumption that the objects are opaque or use a simplified model to describe translucent objects, which significantly affects the reconstruction quality and limits the downstream tasks such as relighting or material editing. We present a novel framework that tackles this challenge through a combination of physically grounded and data-driven strategies. At the core of our approach is a hybrid rendering supervision scheme that fuses a differentiable physical renderer with a learned neural renderer to guide reconstruction. To further enhance supervision, we introduce an augmented loss tailored to the neural renderer. Our system takes as input a flash/no-flash image pair, enabling it to disambiguate complex light propagation that happens inside translucent objects. We train our model on a large-scale synthetic dataset of 117 K scenes and evaluate across both synthetic benchmarks and real-world captures. To mitigate the domain gap between synthetic and real data, we contribute a new real-world dataset with ground-truth surface normals and fine-tune our model accordingly. Extensive experiments validate the robustness and accuracy of our method across diverse scenarios.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"164 ","pages":"Article 105793"},"PeriodicalIF":4.2,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145366055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SPM-CyViT: A self-supervised pre-trained cycle-consistent vision transformer with multi-branch for contrast-enhanced CT synthesis SPM-CyViT:一种自监督预训练周期一致视觉变压器,具有多分支,用于对比增强CT合成
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-12-01 Epub Date: 2025-10-31 DOI: 10.1016/j.imavis.2025.105802
Hongwei Yang , Wen Zeng , Ke Chen , Zhan Hua , Yan Zhuang , Lin Han , Guoliang Liao , Yiteng Zhang , Hanyu Li , Zhenlin Li , Jiangli Lin
{"title":"SPM-CyViT: A self-supervised pre-trained cycle-consistent vision transformer with multi-branch for contrast-enhanced CT synthesis","authors":"Hongwei Yang ,&nbsp;Wen Zeng ,&nbsp;Ke Chen ,&nbsp;Zhan Hua ,&nbsp;Yan Zhuang ,&nbsp;Lin Han ,&nbsp;Guoliang Liao ,&nbsp;Yiteng Zhang ,&nbsp;Hanyu Li ,&nbsp;Zhenlin Li ,&nbsp;Jiangli Lin","doi":"10.1016/j.imavis.2025.105802","DOIUrl":"10.1016/j.imavis.2025.105802","url":null,"abstract":"<div><div>Contrast-enhanced computed tomography (CECT) is crucial for assessing vascular anatomy and pathology. However, the use of iodine contrast medium poses risks, including anaphylactic shock and acute kidney injury. To address this, we propose SPM-CyViT, a self-supervised pre-trained, multi-branch, cycle-consistent vision transformer that synthesizes high-quality virtual CECT from non-contrast CT (NCCT). Its generator employs a parallel encoding approach, combining vision transformer blocks with convolutional downsampling layers. Their encoded outputs are fused through a tailored cross-attention module, producing feature maps with multi-scale complementary properties. Employing masked reconstruction, the ViT global encoder enables self-supervised pre-training on diverse unlabeled CT slices. This overcomes fixed-dataset limitations and significantly improves generalization. Additionally, the model features a multi-branch decoder-discriminator design tailored to specific labels. It incorporates 40 keV monoenergetic enhanced CT (MonoE) as an auxiliary label to optimize contrast-sensitive regions. Results from the dual-center internal test set demonstrate that SPM-CyViT outperforms existing CECT synthesis models across all quantitative metrics. Furthermore, based on real CECT as a benchmark, three radiologists awarded SPM-CyViT an average clinical evaluation score of 4.215.00 across multiple perspectives. Additionally, SPM-CyViT exhibits robust generalization on the external test set, achieving a mean CNR of 10.96 for synthesized CECT, nearing the 12.38 value of real CECT, collectively underscoring its clinical application potential.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"164 ","pages":"Article 105802"},"PeriodicalIF":4.2,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145467697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书