Image and Vision Computing最新文献

筛选
英文 中文
STIFormer: RGB-T tracking via Spatial–Temporal Interaction Transformer STIFormer:基于时空交互变压器的RGB-T跟踪
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-04-01 Epub Date: 2026-02-11 DOI: 10.1016/j.imavis.2026.105929
Boyue Xu, Yaqun Fang, Ruichao Hou, Tongwei Ren
{"title":"STIFormer: RGB-T tracking via Spatial–Temporal Interaction Transformer","authors":"Boyue Xu,&nbsp;Yaqun Fang,&nbsp;Ruichao Hou,&nbsp;Tongwei Ren","doi":"10.1016/j.imavis.2026.105929","DOIUrl":"10.1016/j.imavis.2026.105929","url":null,"abstract":"<div><div>Existing RGB-Thermal (RGB-T) trackers integrate the RGB and thermal modalities by using cross-attention and estimate the object position by computing the correlation between a single template and the search region. However, many trackers yield unsatisfactory performance due to their disregard for inter-frame cues between modalities and dynamic changes in the dominant modality. To address this issue, we propose a novel <strong>S</strong>patial-<strong>T</strong>emporal <strong>I</strong>nteraction Trans<strong>former</strong>, called <strong>STIFormer</strong>, which effectively merges multi-modal features from both spatial and temporal domains, enhancing the robustness of RGB-T tracking. In particular, a spatial–temporal feature representation module is proposed to facilitate inter-frame information exchange through token propagation, which encodes features from multi-frames and a temporal token. In addition, a token-guided mixed attention fusion module is proposed to fuse the frame features and token features from different modalities. Extensive experiments demonstrate that our proposed method achieves state-of-the-art performance on public RGB-T benchmarks. The project page is available at: <span><span>https://github.com/xuboyue1999/STIFormer</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"168 ","pages":"Article 105929"},"PeriodicalIF":4.2,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mamba-Driven Topology Fusion for monocular 3D human pose estimation mamba驱动的拓扑融合用于单目三维人体姿态估计
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-04-01 Epub Date: 2026-02-06 DOI: 10.1016/j.imavis.2026.105927
Zenghao Zheng , Lianping Yang , Jinshan Pan , Hegui Zhu
{"title":"Mamba-Driven Topology Fusion for monocular 3D human pose estimation","authors":"Zenghao Zheng ,&nbsp;Lianping Yang ,&nbsp;Jinshan Pan ,&nbsp;Hegui Zhu","doi":"10.1016/j.imavis.2026.105927","DOIUrl":"10.1016/j.imavis.2026.105927","url":null,"abstract":"<div><div>The Mamba model has gradually garnered widespread attention in 3D human pose estimation tasks due to its linear time scaling capability and excellent expressive power. However, the Mamba model exhibits deficiencies in handling human body topological structures, as its internal state space model and one-dimensional causal convolutional network have inherent design limitations in processing global topological sequences and local structures. To address these issues, we propose the Mamba-Driven Topology Fusion framework. For global topological guidance of the Mamba, we design a Bone Aware Module to deliver directional and length guidance of human skeletons in the spherical coordinate system. To capture dependencies between local joints, we enhance the convolutional structure within the Mamba by integrating forward and backward graph convolutional networks. Additionally, a Bone-Joint Fusion Embedding and a Spatiotemporal Refinement Module are proposed to fuse global skeletal and keypoint information and extract spatiotemporal features, respectively. The proposed Mamba-Driven Topology Fusion framework effectively alleviates the Mamba model’s incompatibility with the topological structures of human keypoints. We conduct extensive experiments on the Human3.6M and MPI-INF-3DHP datasets for evaluation and comparison, and the results demonstrate that the proposed method significantly reduces computational cost while achieving higher accuracy. Our model and code are available at <span><span>https://github.com/ZenghaoZheng/MDTF-3DHPE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"168 ","pages":"Article 105927"},"PeriodicalIF":4.2,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-target information also matters: InverseFormer tracker for single object tracking 非目标信息也很重要:用于单目标跟踪的InverseFormer跟踪器
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-04-01 Epub Date: 2026-02-08 DOI: 10.1016/j.imavis.2026.105922
Qiuhang Gu , Baopeng Zhang , Zhu Teng , Hongwei Xu
{"title":"Non-target information also matters: InverseFormer tracker for single object tracking","authors":"Qiuhang Gu ,&nbsp;Baopeng Zhang ,&nbsp;Zhu Teng ,&nbsp;Hongwei Xu","doi":"10.1016/j.imavis.2026.105922","DOIUrl":"10.1016/j.imavis.2026.105922","url":null,"abstract":"<div><div>Visual object tracking has been significantly improved by Transformer-based methods. However, most existing trackers perform target-oriented inference, which enhances target-relevant features while ignoring non-target features. We argue that non-target information also contains abundant clues that can provide significant guidance for tracking inference. In this work, we propose a novel InverseFormer tracker constructed by stacking multiple InverseFormer blocks. The proposed InverseFormer block consists of a context aggregation unit and an inverse enhancement unit. The former aggregates local context correlation information while boosting tracking efficiency. The latter enhances the template-search image pair by using non-target information in the search region, which significantly suppresses background-relevant features while preserving target details, leading to more accurate tracking. Extensive experiments conducted on seven benchmarks demonstrate that our tracker outperforms state-of-the-art methods at a real-time speed of 45 FPS.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"168 ","pages":"Article 105922"},"PeriodicalIF":4.2,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient ultra-lightweight convolutional attention network for embedded identity document recognition system 嵌入式身份证件识别系统的高效超轻量级卷积注意网络
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-04-01 Epub Date: 2026-02-10 DOI: 10.1016/j.imavis.2026.105930
Yehu Shen , Jikun Wei , Xuemei Niu , Guizhong Fu , Zihe Cao
{"title":"Efficient ultra-lightweight convolutional attention network for embedded identity document recognition system","authors":"Yehu Shen ,&nbsp;Jikun Wei ,&nbsp;Xuemei Niu ,&nbsp;Guizhong Fu ,&nbsp;Zihe Cao","doi":"10.1016/j.imavis.2026.105930","DOIUrl":"10.1016/j.imavis.2026.105930","url":null,"abstract":"<div><div>With the rapid development of IoT, identity document recognition has been widely applied in various fields. Efficient recognition systems are crucial for deployment on resource-constrained embedded devices, but many deep learning models suffer from high computational complexity. We propose an efficient character recognition system with a two-stage framework: a document number detection network and an ultra-lightweight attention-based recognition network named EULCAN (Efficient Ultra-Lightweight Convolutional Attention Network). EULCAN's feature extraction module employs a novel Dense Simplified Convolutional Attention Module (DSCAM) and a Dual Dimensionality Reduction Block (DDRB) to capture discriminative features efficiently. DSCAM combines an Efficient Bottleneck Convolution Block and a Simplified Channel Attention Block, significantly reducing computational costs while maintaining accuracy. For sequence transcription, a simple fully connected layer coupled with a Connectionist Temporal Classification (CTC) layer is used for robust recognition. Evaluated on the BDCI benchmark and a real-world SUST dataset, EULCAN achieves competitive accuracies of 97.1% and 95.3%, respectively, while maintaining only 2.8 M parameters and 0.497 GFLOPs. Compared to MobileNetV3, the second most lightweight deployment-ready model, EULCAN improves accuracy by 11.7%, while its parameter size is only 0.6% of OmniParser, the most accurate model. Furthermore, the proposed identity document recognition system has been successfully deployed in real-world scenarios. On the RK3588S2 development board, EULCAN achieves an impressive inference speed of 65 FPS, demonstrating its practicality for embedded IoT applications. The source code is publicly available at <span><span>https://github.com/ymxb1/EULCAN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"168 ","pages":"Article 105930"},"PeriodicalIF":4.2,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DSAC-Hash: Distribution-Similarity-Aware Cross-modal Hashing DSAC-Hash:分布相似度感知的跨模态哈希
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-04-01 Epub Date: 2026-02-09 DOI: 10.1016/j.imavis.2026.105926
Mutaz Ibrahim Mohammed Ahmed Ibrahim , Dejiao Niu , Tao Cai , Lei Li , Bilal Ahmad
{"title":"DSAC-Hash: Distribution-Similarity-Aware Cross-modal Hashing","authors":"Mutaz Ibrahim Mohammed Ahmed Ibrahim ,&nbsp;Dejiao Niu ,&nbsp;Tao Cai ,&nbsp;Lei Li ,&nbsp;Bilal Ahmad","doi":"10.1016/j.imavis.2026.105926","DOIUrl":"10.1016/j.imavis.2026.105926","url":null,"abstract":"<div><div>The rapid growth of online multimedia data has made cross-modal hashing crucial for efficient retrieval. Existing methods often fail to handle the heterogeneity of image and text data and lack sufficient semantic interaction, resulting in reduced retrieval accuracy. To address these issues, we introduce the DSAC-Hash framework, which includes an Innovative Semantic Interaction Aggregator (SIA) to refine inter- and intra-modal relationships, reducing semantic discrepancies and enhancing retrieval performance. Additionally, we present a unified weighted loss framework optimizes cross-modal similarity by incorporating weighted triplet, contrastive, and semantic loss functions, improving the quality of binary hash codes. These enhancements significantly boost image-to-text (I2T) and text-to-image (T2I) retrieval performance.Experiments on MS COCO, Mirflickr-25k, and NUS-Wide show that DSAC-Hash achieves state-of-the-art performance, with notable MAP improvements with at least: <span><math><mrow><mn>4</mn><mo>.</mo><mn>59</mn><mo>∼</mo><mn>10</mn><mo>.</mo><mn>45</mn><mtext>%</mtext></mrow></math></span> (I2T) and <span><math><mrow><mn>7</mn><mo>.</mo><mn>39</mn><mo>∼</mo><mn>12</mn><mo>.</mo><mn>96</mn><mtext>%</mtext></mrow></math></span> (T2I) on MS COCO, <span><math><mrow><mn>1</mn><mo>.</mo><mn>52</mn><mo>∼</mo><mn>8</mn><mo>.</mo><mn>81</mn><mtext>%</mtext></mrow></math></span> (I2T) and <span><math><mrow><mn>2</mn><mo>.</mo><mn>75</mn><mo>∼</mo><mn>7</mn><mo>.</mo><mn>34</mn><mtext>%</mtext></mrow></math></span> (T2I) on Mir-Flickr, and <span><math><mrow><mn>4</mn><mo>.</mo><mn>78</mn><mo>∼</mo><mn>7</mn><mo>.</mo><mn>74</mn><mtext>%</mtext></mrow></math></span> (I2T) and <span><math><mrow><mn>7</mn><mo>.</mo><mn>03</mn><mo>∼</mo><mn>9</mn><mo>.</mo><mn>42</mn><mtext>%</mtext></mrow></math></span> (T2I) on NUS-WIDE, confirming its robustness, scalability, and effectiveness in large-scale multimedia retrieval scenarios.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"168 ","pages":"Article 105926"},"PeriodicalIF":4.2,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A lightweight shallow convolution neural network for automatic identification of Diabetic Foot Ulcers 用于糖尿病足溃疡自动识别的轻量级浅卷积神经网络
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-04-01 Epub Date: 2026-01-31 DOI: 10.1016/j.imavis.2026.105925
Sujit Kumar Das , Parag Bhuyan , Nageswara Rao Moparthi , Suyel Namasudra
{"title":"A lightweight shallow convolution neural network for automatic identification of Diabetic Foot Ulcers","authors":"Sujit Kumar Das ,&nbsp;Parag Bhuyan ,&nbsp;Nageswara Rao Moparthi ,&nbsp;Suyel Namasudra","doi":"10.1016/j.imavis.2026.105925","DOIUrl":"10.1016/j.imavis.2026.105925","url":null,"abstract":"<div><div>In standard clinical practices, disease diagnosis demands expensive tests and time-consuming procedures. Additionally, manual inspection by clinicians may sometimes lead to incorrect diagnostic results. Accurate identification of Diabetic Foot Ulcers (DFUs) is essential for early intervention and reducing the risk of serious complications. The evolution of deep learning techniques in image analysis has made significant contributions over the last decade. However, designing a computationally efficient and cost-effective deep learning network remains a challenge. This study proposes a lightweight and computationally efficient Convolutional Neural Network (CNN) architecture for automatic DFU classification. The proposed model primarily consists of varying-sized convolution kernels connected in a parallel manner, positional encoding (PE), and aggregated pooling (AGP) to enhance both global and local feature representation while maintaining a shallow and resource-efficient design. The proposed network is evaluated on publicly available DFU datasets and benchmarked against widely used deep learning models. Experimental results demonstrate that the proposed model outperforms state-of-the-art works with the highest average F1-Score of 94.83%, 94.63%, and 99.49% for DFU, infection, and ischaemia identification, respectively. The results also indicate that the proposed CNN achieves superior performance with significantly reduced computational cost, making it suitable for deployment on low-power and IoT-enabled medical devices.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"168 ","pages":"Article 105925"},"PeriodicalIF":4.2,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SRformer: A hybrid semantic-regional transformer for indoor 3D object detection SRformer:一种用于室内三维物体检测的混合语义区域变压器
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-04-01 Epub Date: 2026-01-27 DOI: 10.1016/j.imavis.2026.105919
Kunpeng Bi, Shuang Wang, Xiangyang Jiang, Miaohui Zhang
{"title":"SRformer: A hybrid semantic-regional transformer for indoor 3D object detection","authors":"Kunpeng Bi,&nbsp;Shuang Wang,&nbsp;Xiangyang Jiang,&nbsp;Miaohui Zhang","doi":"10.1016/j.imavis.2026.105919","DOIUrl":"10.1016/j.imavis.2026.105919","url":null,"abstract":"<div><div>Detection transformer has been widely applied to 3D object detection, achieving impressive results in various scenarios. However, effectively fusing regional and semantic features in query selection and cross-attention remains a challenge. This paper systematically analyzes detection transformers and proposes SRformer, a novel two-stage 3D object detector with several key designs. First, SRformer introduces a Hybrid Query Selector (HQS), which splits the first stage into a prediction branch and a sampling branch. The sampling branch is supervised by a novel hybrid query loss based on regional and semantic features, thereby filtering out high-quality initial query boxes. Next, a Regional Reinforcement Attention (RRA) is introduced to enhance instance-level attention. The RRA learns a set of key points and maps their regional differences to a relative coordinate table to construct explicit instance-level regional context feature constraints, thereby modulating the cross-attention map. Additionally, a Top-K Bipartite Graph Matching (KBM) is introduced to increase the number of positive samples and enhance training stability, along with a Residual-based Bounding Box Decoder (RBBD) that parameterizes the bounding box into residual components relative to predefined base sizes for more robust and precise regression. Extensive experiments on the challenging ScanNetV2 and SUN RGB-D datasets demonstrate the effectiveness and robustness of SRformer, achieving a new state-of-the-art result on ScanNetV2, with 76.8 and 64.8 in mAP25 and mAP50, respectively.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"168 ","pages":"Article 105919"},"PeriodicalIF":4.2,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146102558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RelPose-TTA: Energy-based relative pose correction for test-time adaptation of category-level object pose estimation RelPose-TTA:基于能量的相对姿态校正,用于类别级目标姿态估计的测试时间适应
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-04-01 Epub Date: 2026-02-07 DOI: 10.1016/j.imavis.2026.105928
Yue Zhan , Xin Wang , Zhaoxiang Liu , Shiguo Lian , Tangwen Yang
{"title":"RelPose-TTA: Energy-based relative pose correction for test-time adaptation of category-level object pose estimation","authors":"Yue Zhan ,&nbsp;Xin Wang ,&nbsp;Zhaoxiang Liu ,&nbsp;Shiguo Lian ,&nbsp;Tangwen Yang","doi":"10.1016/j.imavis.2026.105928","DOIUrl":"10.1016/j.imavis.2026.105928","url":null,"abstract":"<div><div>Category-level object pose estimation is fundamental for robotic grasping and manipulation, yet models trained on synthetic data often generalize poorly to real-world environments due to substantial domain gaps. Test-time adaptation (TTA) offers a promising solution to address this challenge, but existing methods frequently depend on noisy pseudo-labels or complex optimization, which can lead to performance degradation and error accumulation over time. In this paper, we propose RelPose-TTA, a test-time adaptation framework that improves the generalization and long-term stability for category-level object pose estimation in previously unseen real-world environments. The core idea is to exploit the relative motion between consecutive frames, which is typically more stable and reliable than single-frame absolute pose estimation, and to use it as a self-supervisory signal during inference. Concretely, RelPose-TTA introduces an energy-based relative pose corrector to model inter-frame motion and mitigate ambiguities induced by occlusions, object symmetries, and large viewpoint changes. During test-time adaptation, the corrector is updated online via contrastive learning and is tightly coupled with point cloud registration, so that refined relative pose estimates can effectively guide absolute pose refinement. Extensive experiments demonstrate that RelPose-TTA consistently outperforms prior TTA methods in unseen real-world settings, while substantially reducing long-term drift and maintaining stable pose predictions.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"168 ","pages":"Article 105928"},"PeriodicalIF":4.2,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DR-TrustNet: Enhancing diabetic retinopathy detection using reliable efficient networks and uncertainty quantification DR-TrustNet:利用可靠高效的网络和不确定性量化加强糖尿病视网膜病变的检测
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-04-01 Epub Date: 2026-01-31 DOI: 10.1016/j.imavis.2026.105921
Preeti Verma , Sivasankar Elango , Kunwar Singh
{"title":"DR-TrustNet: Enhancing diabetic retinopathy detection using reliable efficient networks and uncertainty quantification","authors":"Preeti Verma ,&nbsp;Sivasankar Elango ,&nbsp;Kunwar Singh","doi":"10.1016/j.imavis.2026.105921","DOIUrl":"10.1016/j.imavis.2026.105921","url":null,"abstract":"<div><div>Diabetic retinopathy (DR) is one of the main reasons people lose their vision, and catching it early is key to stopping permanent damage. Right now, doctors rely on manual screening, which takes a lot of time and it is not always consistent. The introduction of deep neural networks (DNNs) is a revolutionary step in analyzing high-precision DR detection, but there are concerns: these models can be over-confident in their prediction, leading to mistakes, especially in critical health care. Another problem is that the current method of deep learning does not respond well to uncertainties, which makes it difficult to trust them in the real medical environment. To address these challenges, we have developed a new system of three components. First, we improved the quality of retinal images using the Adaptive Fundus Enhancement Pipeline (AFEP). Then we will extract more useful features from the image using a modified version of EfficientNet-B0. Finally, we add steps to calibrate the model's prediction to ensure that its level of confidence is actually accurate. This step reduces the chances of incorrect diagnosis by utilizing a test time data augmentation and temperature scaling. The results of the IDRiD dataset test were promising. The model achieved 96% accuracy and showed a much better uncertainty calibration, with an expected calibration error of only 0.030. In other words, it is not only accurate, but also more reliable in the real world. Overall, our methodology can make AI-based DR screening more practical and reliable for both doctors and patients.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"168 ","pages":"Article 105921"},"PeriodicalIF":4.2,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HOPE: Histopathological image Organization and Processing Environment HOPE:组织病理图像、组织与处理环境
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-04-01 Epub Date: 2026-01-30 DOI: 10.1016/j.imavis.2026.105924
Daniel Riccio, Mara Sangiovanni, Francesco Longobardi, Andrea Francesco Scalella, Vincenzo Manfredi
{"title":"HOPE: Histopathological image Organization and Processing Environment","authors":"Daniel Riccio,&nbsp;Mara Sangiovanni,&nbsp;Francesco Longobardi,&nbsp;Andrea Francesco Scalella,&nbsp;Vincenzo Manfredi","doi":"10.1016/j.imavis.2026.105924","DOIUrl":"10.1016/j.imavis.2026.105924","url":null,"abstract":"<div><div>In disciplines such as digital pathology, the management of vast amounts of data, primarily ultra-high-resolution images, remains a significant barrier to the widespread adoption and seamless sharing of knowledge. Current research efforts are heavily focused on image encoding, often overlooking equally critical aspects such as indexing and efficient content transmission. Traditional compression methods, such as JPEG2000, prioritize reconstruction quality but do not inherently support direct retrieval or progressive transmission, both of which are essential for applications like telemedicine and large-scale digital pathology archives. To bridge this gap, we introduce a novel framework that integrates fractal compression, deep learning-based retrieval, and adaptive transmission, optimizing not only storage efficiency but also accessibility and scalability in histopathological imaging.</div><div>The Histopathological image Organization and Processing Environment (HOPE) framework here proposed exploits Partitioned Iterated Function Systems for image compression, achieving high compression ratios while preserving essential structural details. To mitigate the inherent artifacts of fractal compression, a U-Net autoencoder is integrated, refining decompressed images and enhancing visual quality. Additionally, a residual encoding mechanism is employed, allowing for lossless reconstruction when necessary. Unlike conventional methods, this framework enables direct retrieval from the compressed domain by extracting discriminative features from the fractal encoding coefficients. Another key innovation is its progressive transmission capability, which allows an initial low-bitrate preview to be sent, followed by incremental quality refinements based on diagnostic needs. This significantly reduces network load and enables real-time access to high-resolution histopathological images on resource-limited devices. Experimental results demonstrate that the proposed framework achieves compression performance comparable to JPEG2000, while simultaneously enabling efficient indexing, high-accuracy retrieval, and scalable transmission.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"168 ","pages":"Article 105924"},"PeriodicalIF":4.2,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书