Image and Vision Computing最新文献

筛选
英文 中文
Shape-from-template with generalised camera 形状模板与广义相机
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-07-26 DOI: 10.1016/j.imavis.2025.105579
Agniva Sengupta, Stefan Zachow
{"title":"Shape-from-template with generalised camera","authors":"Agniva Sengupta,&nbsp;Stefan Zachow","doi":"10.1016/j.imavis.2025.105579","DOIUrl":"10.1016/j.imavis.2025.105579","url":null,"abstract":"<div><div>This article presents a new method for non-rigidly registering a 3D shape to 2D keypoints observed by a constellation of multiple cameras. Non-rigid registration of a 3D shape to observed 2D keypoints, i.e., Shape-from-Template (S<em>f</em>T), has been widely studied using single images, but S<em>f</em>T with information from multiple-cameras jointly opens new directions for extending the scope of known use-cases such as 3D shape registration in medical imaging and registration from hand-held cameras, to name a few. We represent such multi-camera setup with the generalised camera model; therefore any collection of perspective or orthographic cameras observing any deforming object can be registered. We propose multiple approaches for such S<em>f</em>T: the <em>first</em> approach where the corresponded keypoints lie on a direction vector from a known 3D point in space, the <em>second</em> approach where the corresponded keypoints lie on a direction vector from an unknown 3D point in space but with known orientation w.r.t some local reference frame, and a <em>third</em> approach where, apart from correspondences, the silhouette of the imaged object is also known. Together, these form the first set of solutions to the S<em>f</em>T problem with generalised cameras. The key idea behind S<em>f</em>T with generalised camera is the improved reconstruction accuracy from estimating deformed shape while utilising the additional information from the mutual constraints between multiple views of a deformed object. The correspondence-based approaches are solved with convex programming while the silhouette-based approach is an iterative refinement of the results from the convex solutions. We demonstrate the accuracy of our proposed methods on many synthetic and real data<span><span><sup>1</sup></span></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105579"},"PeriodicalIF":4.2,"publicationDate":"2025-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144840690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CoHAtNet: An integrated convolutional-transformer architecture with hybrid self-attention for end-to-end camera localization CoHAtNet:一个集成的卷积转换器架构,具有混合自关注,用于端到端相机定位
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-07-26 DOI: 10.1016/j.imavis.2025.105674
Hussein Hasan , Miguel Angel Garcia , Hatem Rashwan , Domenec Puig
{"title":"CoHAtNet: An integrated convolutional-transformer architecture with hybrid self-attention for end-to-end camera localization","authors":"Hussein Hasan ,&nbsp;Miguel Angel Garcia ,&nbsp;Hatem Rashwan ,&nbsp;Domenec Puig","doi":"10.1016/j.imavis.2025.105674","DOIUrl":"10.1016/j.imavis.2025.105674","url":null,"abstract":"<div><div>Camera localization refers to the process of automatically determining the position and orientation of a camera within its 3D environment from the images it captures. Traditional camera localization methods often rely on Convolutional Neural Networks, which are effective at extracting local visual features but struggle to capture long-range dependencies critical for accurate localization. In contrast, Transformer-based approaches model global contextual relationships appropriately, although they often lack precision in fine-grained spatial representations. To bridge this gap, we introduce CoHAtNet, a novel Convolutional Hybrid-Attention Network that tightly integrates convolutional and self-attention mechanisms.</div><div>Unlike previous hybrid models that stack convolutional and attention layers separately, CoHAtNet embeds local features extracted via Mobile Inverted Bottleneck Convolution blocks directly into the Value component of the self-attention mechanism of Transformers. This yields a hybrid self-attention block capable of dynamically capturing both local spatial detail and global semantic context within a single attention layer. Additionally, CoHAtNet enables modality-level fusion by processing RGB and depth data jointly in a unified pipeline, allowing the model to leverage complementary appearance and geometric cues throughout.</div><div>Extensive evaluations have been conducted on two widely-used camera localization datasets: 7-Scenes (RGB-D) and Cambridge Landmarks (RGB). Experimental results show that CoHAtNet achieves state-of-the-art performance in both translation and orientation accuracy. These results highlight the effectiveness of our hybrid design in challenging indoor and outdoor environments. This makes CoHAtNet a strong candidate for end-to-end camera localization tasks.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105674"},"PeriodicalIF":4.2,"publicationDate":"2025-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144757952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
APS-NeuS: Adaptive planar and skip-sampling for 3D object surface reconstruction in high-specular scenes aps - news:用于高镜面场景中3D物体表面重建的自适应平面和跳过采样
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-07-25 DOI: 10.1016/j.imavis.2025.105665
Wei Gao, Li Jin, Youssef Akoudad, Yang Yang
{"title":"APS-NeuS: Adaptive planar and skip-sampling for 3D object surface reconstruction in high-specular scenes","authors":"Wei Gao,&nbsp;Li Jin,&nbsp;Youssef Akoudad,&nbsp;Yang Yang","doi":"10.1016/j.imavis.2025.105665","DOIUrl":"10.1016/j.imavis.2025.105665","url":null,"abstract":"<div><div>High-fidelity 3D object surface reconstruction remains challenging in real-world scenes with strong specular reflections, where multi-view consistency is disrupted by reflection artifacts. To address this, we propose APS-NeuS, an implicit neural rendering framework designed to robustly separate target objects from reflective interference. Specifically, we establish a pixel-wise auxiliary mirror plane to differentiate reflections from target objects and incorporate a Laplacian gradient to better recover their edges and fine structures. Additionally, we introduce a skip-sampling strategy to reduce the impact of reflective interference, further enhancing multi-view consistency and surface fidelity. Finally, we introduce an exclusion loss to help the model more accurately separate the target objects from the reflective parts during initialization by comparing the gradient differences. Extensive experiments on synthetic and real-world datasets show that APS-NeuS achieves superior reconstruction quality under high-specular reflection conditions, demonstrating its practical applicability to complex environments. Code is available at <span><span>https://github.com/ujsjl/APS-NeuS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105665"},"PeriodicalIF":4.2,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144724555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MSDNet: Multi-scale decoder for few-shot semantic segmentation via transformer-guided prototyping MSDNet:基于变压器引导原型的多尺度语义分割解码器
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-07-25 DOI: 10.1016/j.imavis.2025.105672
Amirreza Fateh, Mohammad Reza Mohammadi, Mohammad Reza Jahed-Motlagh
{"title":"MSDNet: Multi-scale decoder for few-shot semantic segmentation via transformer-guided prototyping","authors":"Amirreza Fateh,&nbsp;Mohammad Reza Mohammadi,&nbsp;Mohammad Reza Jahed-Motlagh","doi":"10.1016/j.imavis.2025.105672","DOIUrl":"10.1016/j.imavis.2025.105672","url":null,"abstract":"<div><div>Few-shot Semantic Segmentation addresses the challenge of segmenting objects in query images with only a handful of annotated examples. However, many previous state-of-the-art methods either have to discard intricate local semantic features or suffer from high computational complexity. To address these challenges, we propose a new Few-shot Semantic Segmentation framework based on the Transformer architecture. Our approach introduces the spatial transformer decoder and the contextual mask generation module to improve the relational understanding between support and query images. Moreover, we introduce a multi scale decoder to refine the segmentation mask by incorporating features from different resolutions in a hierarchical manner. Additionally, our approach integrates global features from intermediate encoder stages to improve contextual understanding, while maintaining a lightweight structure to reduce complexity. This balance between performance and efficiency enables our method to achieve competitive results on benchmark datasets such as <span><math><mrow><mi>P</mi><mi>A</mi><mi>S</mi><mi>C</mi><mi>A</mi><mi>L</mi><mtext>-</mtext><msup><mrow><mn>5</mn></mrow><mrow><mi>i</mi></mrow></msup></mrow></math></span> and <span><math><mrow><mi>C</mi><mi>O</mi><mi>C</mi><mi>O</mi><mtext>-</mtext><mn>2</mn><msup><mrow><mn>0</mn></mrow><mrow><mi>i</mi></mrow></msup></mrow></math></span> in both 1-shot and 5-shot settings. Notably, our model with only 1.5 million parameters demonstrates competitive performance while overcoming limitations of existing methodologies. <span><span>https://github.com/amirrezafateh/MSDNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105672"},"PeriodicalIF":4.2,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deepfake detection via Feature Refinement and Enhancement Network 基于特征改进和增强网络的深度伪造检测
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-07-24 DOI: 10.1016/j.imavis.2025.105663
Weicheng Song , Siyou Guo , Mingliang Gao , Qilei Li , Xianxun Zhu , Imad Rida
{"title":"Deepfake detection via Feature Refinement and Enhancement Network","authors":"Weicheng Song ,&nbsp;Siyou Guo ,&nbsp;Mingliang Gao ,&nbsp;Qilei Li ,&nbsp;Xianxun Zhu ,&nbsp;Imad Rida","doi":"10.1016/j.imavis.2025.105663","DOIUrl":"10.1016/j.imavis.2025.105663","url":null,"abstract":"<div><div>The rapid advancement of deepfake technology poses significant threats to the integrity and privacy of biometric systems, such as facial recognition and voice authentication. To address this issue, there is an urgent need for advanced forensic detection methods that can reliably safeguard biometric data from manipulation and unauthorized access. However, current methods mainly focus on shallow feature extraction and neglect feature refinement and enhancement, which leads to low detection accuracy and poor generalization performance. To address this problem, we propose Feature Refinement and Enhancement Network (FRENet) for deepfake detection by leveraging progressive refinement and enhanced mixed feature learning. Specifically, a Low Rank Projected Self-Attention (LPSA) module is introduced for the refinement and enhancement of features. Also, a Patch-based Focused (PatchFocus) module is proposed to highlight local texture inconsistencies in key regions. In addition, we propose a Refine Fusion (RefFus) module that integrates the refined features and associated noise information to enhance feature separability. Experimental results across five benchmark datasets demonstrate that the proposed FRENet outperforms state-of-the-art methods in terms of both accuracy and generalization. The code is available at <span><span>https://github.com/weichengsong-code/FRENet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105663"},"PeriodicalIF":4.2,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144770668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lightweight multi-scale global attention enhancement network for image super-resolution 面向图像超分辨率的轻量级多尺度全局注意力增强网络
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-07-23 DOI: 10.1016/j.imavis.2025.105671
Yue Huang , Pan Wang , Yumei Zheng , Bochuan Zheng
{"title":"Lightweight multi-scale global attention enhancement network for image super-resolution","authors":"Yue Huang ,&nbsp;Pan Wang ,&nbsp;Yumei Zheng ,&nbsp;Bochuan Zheng","doi":"10.1016/j.imavis.2025.105671","DOIUrl":"10.1016/j.imavis.2025.105671","url":null,"abstract":"<div><div>The Transformer-based depth model has achieved impressive results in the field of image super-resolution (SR). However, these algorithms still face a series of complex problems: redundant attention operations lead to low resource utilization, and the sliding window mechanism limits the ability to capture multi-scale feature information. To address these issues, this paper proposes a lightweight multi-scale global attention enhancement network (LMGAE-Net). Specifically, to overcome the window limitations in Transformer models, we introduce a multi-scale global attack block (MGAB), which significantly enhances the model’s ability to capture long-range information by grouping input features and calculating self-attention with varying window sizes. In addition, we propose a multi-group shift fusion block (MSFB), which divides features into equal groups and shifts them in different spatial directions. While maintaining the parameter quantity equivalent to 1×1 convolution, it expands the receptive field, improves the learning and fusion effect of local features, and further enhances the network’s ability to recover image details. Extensive experiments demonstrate that LMGAE-Net outperforms state-of-the-art lightweight SR methods by a large margin.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105671"},"PeriodicalIF":4.2,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modal-aware contrastive learning for hyperspectral and LiDAR classification 模态感知对比学习在高光谱和激光雷达分类中的应用
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-07-21 DOI: 10.1016/j.imavis.2025.105669
Liangyu Zhou , Xiaoyan Luo , Rui Xue
{"title":"Modal-aware contrastive learning for hyperspectral and LiDAR classification","authors":"Liangyu Zhou ,&nbsp;Xiaoyan Luo ,&nbsp;Rui Xue","doi":"10.1016/j.imavis.2025.105669","DOIUrl":"10.1016/j.imavis.2025.105669","url":null,"abstract":"<div><div>Contrastive learning as a self-supervised learning method has received significant attention in the hyperspectral image (HSI) and light detection and ranging (LiDAR) data classification. However, the current contrastive learning-based methods ignore the huge gap between the HSI and LiDAR data in their ability to discriminate ground objects. To fully exploit the potential of HSI in the spectral domain and LiDAR in the spatial domain, we propose a modal-aware contrastive learning (MACL) framework, which learns discriminative multimodal features in both of spatial and spectral domains. First, we design a modal-aligned sample pair construction strategy to ensure that the data structure and characteristics of constructed spectral and spatial sample pairs remain consistent. Then, the spectral and spatial branches based on contrastive learning are adopted to extract multimodal spectral and spatial features in the pre-training stage. Finally, a multimodal attentional feature fusion (MAFF) module is designed to integrate and fuse the multimodal features for the downstream classification task, whose parameters are fine-tuned with a small number of labeled data. Experimental results on three public datasets, i.e., MUUFL, Trento, and Houston2013, demonstrate that our method outperforms several state-of-the-art methods in terms of qualitative and quantitative analysis. Our source codes are available at <span><span>https://github.com/zlyrs1/MACL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105669"},"PeriodicalIF":4.2,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144685544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combining spatio-temporal attention and multi-level feature fusion for video saliency prediction 结合时空注意力和多层次特征融合的视频显著性预测
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-07-21 DOI: 10.1016/j.imavis.2025.105678
Huiyu Luo
{"title":"Combining spatio-temporal attention and multi-level feature fusion for video saliency prediction","authors":"Huiyu Luo","doi":"10.1016/j.imavis.2025.105678","DOIUrl":"10.1016/j.imavis.2025.105678","url":null,"abstract":"<div><div>Recently, 3D convolution-based video saliency prediction models have adopted a fully convolutional encoder-decoder architecture to extract multi-level spatio-temporal features and achieved impressive performance. Deep level features encompass semantic information reflecting salient regions, shallow level features contain detailed information. However, these models have two issues: they fail to capture global information, and the equally weighted fusion mechanism they employ ignores the differences between deep and shallow features. To address these issues, we propose a novel model that combines spatio-temporal attention and multi-level feature fusion, with two main component, the global spatio-temporal correlation (GSC) structure and the attention-guided fusion (AGF) module. The GSC structure employs the Video Swin Transformer to capture global spatio-temporal correlations based on the deepest local spatio-temporal features through the multi-head attention mechanism. Rather than the equally weighted fusion mechanism, the proposed AGF module adaptively compute an attention map with only deep level features through spatio-temporal attention and channel attention branches, which guides the features to focus on salient regions and fuse. Extensive experiments over four datasets demonstrate the proposed model achieves comparable performance against state-of-the-art models and the effectiveness of each component of our model.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105678"},"PeriodicalIF":4.2,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144721876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Augmenting and mixing Transformers with synthetic data for image captioning 增强和混合变压器与合成数据的图像字幕
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-07-21 DOI: 10.1016/j.imavis.2025.105661
Davide Caffagni , Marcella Cornia , Lorenzo Baraldi , Rita Cucchiara
{"title":"Augmenting and mixing Transformers with synthetic data for image captioning","authors":"Davide Caffagni ,&nbsp;Marcella Cornia ,&nbsp;Lorenzo Baraldi ,&nbsp;Rita Cucchiara","doi":"10.1016/j.imavis.2025.105661","DOIUrl":"10.1016/j.imavis.2025.105661","url":null,"abstract":"<div><div>Image captioning has attracted significant attention within the Computer Vision and Multimedia research domains, resulting in the development of effective methods for generating natural language descriptions of images. Concurrently, the rise of generative models has facilitated the production of highly realistic and high-quality images, particularly through recent advancements in latent diffusion models. In this paper, we propose to leverage the recent advances in Generative AI and create additional training data that can be effectively used to boost the performance of an image captioning model. Specifically, we combine real images with their synthetic counterparts generated by Stable Diffusion using a Mixup data augmentation technique to create novel training examples. Extensive experiments on the COCO dataset demonstrate the effectiveness of our solution in comparison to different baselines and state-of-the-art methods and validate the benefits of using synthetic data to augment the training stage of an image captioning model and improve the quality of the generated captions. Source code and trained models are publicly available at: <span><span>https://github.com/aimagelab/synthcap_pp</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105661"},"PeriodicalIF":4.2,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144679976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainable retinal disease classification and localization through Convolutional Neural Networks 可解释的视网膜疾病分类和定位通过卷积神经网络
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-07-19 DOI: 10.1016/j.imavis.2025.105667
Marcello Di Giammarco , Antonella Santone , Mario Cesarelli , Fabio Martinelli , Francesco Mercaldo
{"title":"Explainable retinal disease classification and localization through Convolutional Neural Networks","authors":"Marcello Di Giammarco ,&nbsp;Antonella Santone ,&nbsp;Mario Cesarelli ,&nbsp;Fabio Martinelli ,&nbsp;Francesco Mercaldo","doi":"10.1016/j.imavis.2025.105667","DOIUrl":"10.1016/j.imavis.2025.105667","url":null,"abstract":"<div><div>Retinal diseases pose significant challenges to vision globally, affecting a substantial portion of the population. The reliance on expert clinicians for interpreting Optical Coherence Tomography images underscores the need for automated diagnostic process. In this paper, we propose a method aimed at automatically detecting and localizing retinal disease through deep learning convolutional neural networks starting from the analysis of optical coherence tomography imaging. In detail, we propose and design a novel deep learning model, i.e., FCNNplus, for the classification task of retinal disease, reaching 93.3% in accuracy. The focus is not only on achieving a satisfying retinal disease diagnosis but also on emphasizing the role of CAM algorithms in localizing disease-specific patterns to propose a method considering the explainability and reliability behind the prediction. FCNNplus reports precise and accurate heatmaps localization, correctly identifying the presence of the retinal disease in the images. We take into account an index of similarity aimed to enhance the qualitative aspects and provide a measure of the visual explanation coming from the heatmaps (i.e. the areas of the image under analysis that, from the model point of view are symptomatic of a certain prediction).</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105667"},"PeriodicalIF":4.2,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144685441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信