Image and Vision Computing最新文献

筛选
英文 中文
A hybrid approach combining images and questionnaires for early detection and severity assessment of Autism Spectrum Disorder 自闭症谱系障碍早期发现与严重程度评估的影像与问卷混合方法
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-05-23 DOI: 10.1016/j.imavis.2025.105547
Rajkumar S.C. , Stefano Cirillo , Yuvasini D. , Luisa Solimando
{"title":"A hybrid approach combining images and questionnaires for early detection and severity assessment of Autism Spectrum Disorder","authors":"Rajkumar S.C. ,&nbsp;Stefano Cirillo ,&nbsp;Yuvasini D. ,&nbsp;Luisa Solimando","doi":"10.1016/j.imavis.2025.105547","DOIUrl":"10.1016/j.imavis.2025.105547","url":null,"abstract":"<div><div>In this research, we propose a novel integrated system for the early diagnosis and cognitive enhancement of infants with Autism Spectrum Disorder (ASD). The system combines two core modules: the Behavioral Analytic Module and the Cognitive Skill Enhancement Module. The Behavioral Analytic Module includes a Questionnaire Analysis Sub-module, which utilizes Random Forest classifiers to analyze questionnaire data, and an Image Analysis Sub-module, which employs a fine-tuned VGG16 Convolutional Neural Network to process facial images. These sub-modules independently assess ASD likelihood and combine their outputs to generate a comprehensive diagnosis using a weighted averaging technique. The Cognitive Skill Enhancement Module integrates interactive games and web-based animations designed to improve cognitive abilities and daily living skills in toddlers with ASD. Additionally, a reward system is incorporated to reinforcement learning outcomes, adaptively calculating rewards based on the infants’ progress. The proposed system aims to provide a holistic approach to ASD diagnosis and intervention, offering an effective tool for early detection and tailored cognitive development. The system’s efficacy is demonstrated through comparative analysis, showing a 93% improvement in diagnostic accuracy and a 92% enhancement in cognitive skill development among toddlers with ASD.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"160 ","pages":"Article 105547"},"PeriodicalIF":4.2,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144130957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mmi-Unet: Colorectal cancer CT image segmentation based on multi-modal information interaction mimi - unet:基于多模态信息交互的结直肠癌CT图像分割
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-05-23 DOI: 10.1016/j.imavis.2025.105583
Zihao Zhao , Dinghui Wu , Qibing Zhu , Hao Wang , Yuxi Ge , Shudong Hu
{"title":"Mmi-Unet: Colorectal cancer CT image segmentation based on multi-modal information interaction","authors":"Zihao Zhao ,&nbsp;Dinghui Wu ,&nbsp;Qibing Zhu ,&nbsp;Hao Wang ,&nbsp;Yuxi Ge ,&nbsp;Shudong Hu","doi":"10.1016/j.imavis.2025.105583","DOIUrl":"10.1016/j.imavis.2025.105583","url":null,"abstract":"<div><div>Colorectal cancer (CRC) segmentation from computed tomography (CT) images remains challenging, primarily due to low contrast and the irregular morphology of tumorous lesions. Existing multi-modal methods are often constrained by simplistic feature concatenation strategies, which limit the exploitation of collaborative information across modalities. Such limitations become increasingly pronounced when dealing with complex anatomical structures and highly heterogeneous lesions. To address these challenges, we propose a novel multi-modal segmentation model, referred to as multimodal interaction Unet (Mmi-Unet). Our approach employs separate ResNet encoders to extract modality-specific features, thereby preserving their independence, and leverages cross-attention mechanisms along with information entropy to capture inter-modality synergy. In addition, we introduce a dynamic fusion coefficient training module, enabling flexible adjustment of modality fusion ratios to achieve enhanced information integration. Built on a U-Net framework, Mmi-Unet further incorporates multi-scale feature fusion and collaborative optimization. Experimental results on plain and enhanced CRC imaging tasks indicate that our model surpasses existing approaches, achieving Dice coefficients and intersection-over-union (IoU) scores of up to 0.9557, 0.9559, 0.9326, and 0.9435, respectively. These findings demonstrate the superior accuracy and robustness of the proposed model for CRC segmentation.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"161 ","pages":"Article 105583"},"PeriodicalIF":4.2,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144221737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BFNet: Boundary guidance signal and feature fusion network for camouflaged object detection BFNet:用于伪装目标检测的边界制导信号与特征融合网络
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-05-23 DOI: 10.1016/j.imavis.2025.105599
Xinglin Fu , Weixin Bian , Biao Jie , Haotong Dong , Zhiwei He
{"title":"BFNet: Boundary guidance signal and feature fusion network for camouflaged object detection","authors":"Xinglin Fu ,&nbsp;Weixin Bian ,&nbsp;Biao Jie ,&nbsp;Haotong Dong ,&nbsp;Zhiwei He","doi":"10.1016/j.imavis.2025.105599","DOIUrl":"10.1016/j.imavis.2025.105599","url":null,"abstract":"<div><div>The purpose of Camouflaged Object Detection (COD) is to identify objects that are visually indistinguishable from their backgrounds due to high similarities in color, texture, and luminance. This task presents greater challenges compared to conventional object detection because of the intricate blending between objects and their surroundings. In this paper, a boundary guidance signal and feature fusion network (BFNet) for camouflaged object detection is proposed. The proposed method mainly consists of three key components: boundary guidance signal module (BGSM), attention-induced feature fusion module (AFFM) and gradual camouflage recognition module (GCRM). BGSM captures edge information to generate edge guidance signals. AFFM fuses cross-level features of shallow and deeper layers to obtain rich details and semantic information. Lastly, the GCRM refines the detection of camouflaged objects step by step to obtain the final prediction map. To verify the effectiveness of the proposed method, relevant experiments were conducted on the four challenging benchmark datasets. The experimental results show that BFNet significantly outperforms the other 16 existing methods under five widely used evaluation metrics.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"160 ","pages":"Article 105599"},"PeriodicalIF":4.2,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144194854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FaiResGAN: Fair and robust blind face restoration with biometrics preservation fairresgan:具有生物特征保存的公平和健壮的盲人脸修复
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-05-22 DOI: 10.1016/j.imavis.2025.105575
George Azzopardi , Antonio Greco , Mario Vento
{"title":"FaiResGAN: Fair and robust blind face restoration with biometrics preservation","authors":"George Azzopardi ,&nbsp;Antonio Greco ,&nbsp;Mario Vento","doi":"10.1016/j.imavis.2025.105575","DOIUrl":"10.1016/j.imavis.2025.105575","url":null,"abstract":"<div><div>Modern computer vision technologies enable systems to detect, recognize, and analyze facial features, but challenges arise when images are noisy, blurred, or low quality. Blind face restoration, which aims to recover high-quality facial images without prior knowledge of degradation, addresses this issue. In this paper, we introduce Fair Restoration GAN (FaiResGAN), a novel Generative Adversarial Network (GAN) designed to balance face restoration with the preservation of soft biometrics (identity, ethnicity, age, and gender). Our model incorporates a pseudo-random batch composition algorithm to promote fairness and mitigate bias, alongside a realistic degradation model simulating corruptions typical in surveillance images. Experimental results show that FaiResGAN outperforms state-of-the-art blind face restoration methods, both quantitatively and qualitatively. A user study involving 40 participants showed that FaiResGAN-restored images were preferred by 70% of users. Additionally, tests on VGGFace2, UTKFace, and FairFace datasets demonstrate FaiResGAN’s superior performance in preserving soft biometric attributes and ensuring fair restoration across different genders and ethnicities.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"160 ","pages":"Article 105575"},"PeriodicalIF":4.2,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144135056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SAGNet: Synergistic Attention-Graph Network For video salient object detection SAGNet:视频显著目标检测的协同注意图网络
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-05-19 DOI: 10.1016/j.imavis.2025.105570
Huo Lina, Xueyuan Gao, Wei Wang, Ke Chen, Ke Wang
{"title":"SAGNet: Synergistic Attention-Graph Network For video salient object detection","authors":"Huo Lina,&nbsp;Xueyuan Gao,&nbsp;Wei Wang,&nbsp;Ke Chen,&nbsp;Ke Wang","doi":"10.1016/j.imavis.2025.105570","DOIUrl":"10.1016/j.imavis.2025.105570","url":null,"abstract":"<div><div>In the field of video salient object detection (VSOD), accurately capturing motion information is essential. Previous approaches primarily rely on optical flow, convolutional long short term memory (ConvLSTM), or 3D convolutional neural network (CNN) to extract and utilize motion information. However, these methods capture limited motion details and increase the parameters in the network. Moreover, Transformer-based methods, while effective in high-level feature modeling, suffer from excessive computational complexity and insufficient local feature extraction, limiting their practical application in VSOD. To address these challenges, we propose a novel synergistic attention-graph network (SAGNet) that independently distills spatial–temporal cues and spatial edge features using the synergistic attention-graph module (SAGM) and the spatial edge attention module (SEM), respectively. SAGM innovatively integrates inter-frame attention with spatial–temporal graph convolution network (GCN). The inter-frame attention proposed in SAGM captures motion information between video frames while expanding the receptive field to capture long-range dependencies. Spatial–temporal GCN models video as a graph, bridge features from temporal into spatial branch, which is capable of fusing cross-modal features collaboratively. This synergy enables SAGNet to consider both global and local spatial–temporal features. SEM enhances high-level information by extracting spatial and edge features from the low-level data using the Sobel operator and spatial attention module. Experimental results on several publicly available VSOD benchmark datasets demonstrate that SAGNet outperforms existing methods in terms of detection accuracy and efficiency, confirming its superiority and practicality in VSOD.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"160 ","pages":"Article 105570"},"PeriodicalIF":4.2,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144115095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LCNet: Lightweight real-time image classification network based on efficient multipath dynamic attention mechanism and dynamic threshold convolution LCNet:基于高效多径动态注意机制和动态阈值卷积的轻型实时图像分类网络
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-05-13 DOI: 10.1016/j.imavis.2025.105576
Xiaoxia Yang , Zhishuai Zheng , Huanqi Zheng , Zhedong Ge , Xiaotong Liu , Bei Zhang , Jinyang Lv
{"title":"LCNet: Lightweight real-time image classification network based on efficient multipath dynamic attention mechanism and dynamic threshold convolution","authors":"Xiaoxia Yang ,&nbsp;Zhishuai Zheng ,&nbsp;Huanqi Zheng ,&nbsp;Zhedong Ge ,&nbsp;Xiaotong Liu ,&nbsp;Bei Zhang ,&nbsp;Jinyang Lv","doi":"10.1016/j.imavis.2025.105576","DOIUrl":"10.1016/j.imavis.2025.105576","url":null,"abstract":"<div><div>Hybrid architectures that integrate convolutional neural networks (CNNs) with Transformers can comprehensively extract both local and global image features, exhibiting impressive performance in image classification. However, their large parameter sizes and high computational demands hinder deployment on low-resource devices. To address this limitation, we propose a dual-branch classification network based on a pyramid architecture, termed LCNet. First, we introduce a dynamic threshold convolution module that adaptively adjusts convolutional parameters based on the input, thereby improving the efficiency of feature extraction. Second, we design a multi-path dynamic attention mechanism that optimizes attention weights to capture salient information and enhance the significance of key features. Third, a star-shaped connection is adopted to enable efficient information fusion between the two branches in a high-dimensional implicit feature space. LCNet is evaluated on four public datasets and one wood dataset (Tiny-ImageNet, Mini-ImageNet, CIFAR100, CIFAR10, and Micro-CT) using recognition accuracy and inference efficiency as metrics. The results show that LCNet achieves a maximum accuracy of 99.50% with an inference time of only 0.0072 s per image, outperforming other state-of-the-art (SOTA) models. Extensive experiments demonstrate that LCNet is more competitive than existing neural networks and can be effectively deployed on low-performance computing devices. This broadens the applicability of image classification techniques, aligns with the trend of edge computing, reduces reliance on cloud servers, and enhances both real-time processing and data privacy.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105576"},"PeriodicalIF":4.2,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144068725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep learning-assisted 3D model for the detection and classification of knee arthritis 基于深度学习的膝关节关节炎检测与分类三维模型
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-05-12 DOI: 10.1016/j.imavis.2025.105574
D. Preethi , V. Govindaraj , S. Dhanasekar , K. Martin Sagayam , Syed Immamul Ansarullah , Farhan Amin , Isabel de la Torre D'ıez , Carlos Osorio Garc'ıa , Alina Eugenia Pascual Barrera , Fehaid Salem Alshammari
{"title":"Deep learning-assisted 3D model for the detection and classification of knee arthritis","authors":"D. Preethi ,&nbsp;V. Govindaraj ,&nbsp;S. Dhanasekar ,&nbsp;K. Martin Sagayam ,&nbsp;Syed Immamul Ansarullah ,&nbsp;Farhan Amin ,&nbsp;Isabel de la Torre D'ıez ,&nbsp;Carlos Osorio Garc'ıa ,&nbsp;Alina Eugenia Pascual Barrera ,&nbsp;Fehaid Salem Alshammari","doi":"10.1016/j.imavis.2025.105574","DOIUrl":"10.1016/j.imavis.2025.105574","url":null,"abstract":"<div><div>Osteoarthritis (OA) affects nearly 240 million people worldwide. It is a common degenerative illness that typically affects the knee joint OA causes pain, and functional disability, especially in older adults is a common disease. One of the most common and challenging medical conditions to deal with in old-aged people is the occurrence of knee osteoarthritis (KOA). Manual diagnosis involves observing X-ray images of the knee area and classifying it into different five grades. This requires the physician's expertise, suitable experience, and a lot of time, and even after that, the diagnosis can be prone to errors. Therefore, researchers in the machine learning (ML) and deep learning (DL) domains have employed the capabilities of deep neural network (DNN) models to identify and classify medical images in an automated, faster, and more accurate manner. Combining multiple imaging modalities or utilizing three-dimensional reconstructions can enhance the accuracy and completeness of 2D Images in diagnostic information. Hence to overcome the drawbacks of 2D imaging, the reconstruction of 3D models using 2D images is the main theme of our research. In this paper, we propose a deep learning-based model for the detection and classification of the early diagnosis of arthritis. It is a four-step procedure starting with data collection followed by data conversion. In this step, our proposed model deforms the target's convex hull to produce a 3D model. Herein, a series of 2D photos is utilized, along with surface rendering methods, to create a 3D model. In the third step, the feature extraction is performed followed by mesh refinement. The chamfer loss is optimized based on the rotational shape of the leg bones, and subsequently, the weight of the loss function can be allocated to the target's geometric properties. We have used a modified Gray Level Co-occurrence Matrix (GLCM) for feature extraction. In the fourth step, the image classification is performed and the suggested optimization strategy raises the model's accuracy. A comparison of results with current 3D reconstruction techniques proves that the suggested method can consistently produce a waterproof model with a greater reconstruction accuracy. The deep-seated intricacies and distinct patterns across arthritic phases are estimated through the extraction of complicated statistical variables combined with power spectral density. The high-dimensional data is divided into separate, easily observable groups using the Lion Optimization Algorithm and proposed distance metric. The F1 Score and Jaccard Metric showed an average of 0.85 and 0.23, indicating effective differentiation across clusters.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"160 ","pages":"Article 105574"},"PeriodicalIF":4.2,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144083687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
M3IF-NSST-MTV: Modified Total variation-based multi-modal medical image fusion using Laplacian energy and morphology in the NSST domain M3IF-NSST-MTV:改进的基于全变分的多模态医学图像融合,在NSST域使用拉普拉斯能量和形态学
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-05-11 DOI: 10.1016/j.imavis.2025.105581
Dev Kumar Chaudhary , Prabhishek Singh , Achyut Shankar , Manoj Diwakar
{"title":"M3IF-NSST-MTV: Modified Total variation-based multi-modal medical image fusion using Laplacian energy and morphology in the NSST domain","authors":"Dev Kumar Chaudhary ,&nbsp;Prabhishek Singh ,&nbsp;Achyut Shankar ,&nbsp;Manoj Diwakar","doi":"10.1016/j.imavis.2025.105581","DOIUrl":"10.1016/j.imavis.2025.105581","url":null,"abstract":"<div><div>This paper presents a new multi-modal medical image fusion (M3IF) technique that fuses the medical images obtained from different medical imaging modalities, such as Computed Tomography (CT), Magnetic Resonance Imaging (MRI), Single Photon Emission Computed Tomography (SPECT) or Positron Emission Tomography (PET), into a single image. This single image is enhanced and contains all the important information of the source images. This paper proposes a hybrid M3IF technique, i.e., M3IF-NSST-MTV, where input medical images are decomposed using Non-Subsampled Shearlet Transform (NSST). It decomposes the image into low frequency coefficients (LFCs), and high frequency coefficients (HFCs). The LFCs are fused using Laplacian energy, and HFCs are fused using morphology. The fused image obtained after applying inverse-NSST is directed to the modified Total Variation (TV), that refines the NSST output. This modified TV output is again fused with NSST output using Feature Similarity Index Measure (FSIM) with Correlation Coefficient (CC)-based threshold value. This modified TV refinement process is iterative process. The results of M3IF-NSST-MTV are evaluated at the pre-set number of iterations = 200. The final fusion results of M3IF-NSST-MTV are compared with some of the prevalent non-traditional methods and based on visual quality and quantitative metric-based analysis; it is found that the M3IF-NSST-MTV delivers better results than all the compared methods.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105581"},"PeriodicalIF":4.2,"publicationDate":"2025-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143948162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Weakly supervised camouflaged object detection based on the SAM model and mask guidance 基于SAM模型和掩模制导的弱监督伪装目标检测
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-05-09 DOI: 10.1016/j.imavis.2025.105571
Xia Li, Xinran Liu, Lin Qi, Junyu Dong
{"title":"Weakly supervised camouflaged object detection based on the SAM model and mask guidance","authors":"Xia Li,&nbsp;Xinran Liu,&nbsp;Lin Qi,&nbsp;Junyu Dong","doi":"10.1016/j.imavis.2025.105571","DOIUrl":"10.1016/j.imavis.2025.105571","url":null,"abstract":"<div><div>Camouflaged object detection (COD) from a single image is a challenging task due to the high similarity between objects and their surroundings. Existing fully supervised methods require labor-intensive pixel-level annotations, making weakly supervised methods a viable compromise that balances accuracy and annotation efficiency. However, weakly supervised methods often experience performance degradation due to the use of coarse annotations. In this paper, we introduce a new weakly supervised approach for camouflaged object detection to overcome these limitations. Specifically, we propose a novel network, MGNet, which tackles edge ambiguity and missed detections by utilizing initial masks generated by our custom-designed Cascaded Mask Decoder (CMD) to guide the segmentation process and enhance edge predictions. We introduce a Context Enhancement Module (CEM) to reduce the missing detection, and a Mask-guided Feature Aggregation Module (MFAM) for effective feature aggregation. For the weak supervision challenge, we propose BoxSAM, which leverages the Segment Anything Model (SAM) with bounding-box prompts to generate pseudo-labels. By employing a redundant processing strategy, high quality pixel-level pseudo-labels are provided for training MGNet. Extensive experiments demonstrate that our method delivers competitive performance against current state-of-the-art methods.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105571"},"PeriodicalIF":4.2,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143948241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Calibrated gradient descent of convolutional neural networks for embodied visual recognition 具身视觉识别中卷积神经网络的校准梯度下降
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-05-08 DOI: 10.1016/j.imavis.2025.105568
Zhiming Wang , Sheng Xu , Li’an Zhuo , Baochang Zhang , Yanjing Li , Zhenqian Wang , Guodong Guo
{"title":"Calibrated gradient descent of convolutional neural networks for embodied visual recognition","authors":"Zhiming Wang ,&nbsp;Sheng Xu ,&nbsp;Li’an Zhuo ,&nbsp;Baochang Zhang ,&nbsp;Yanjing Li ,&nbsp;Zhenqian Wang ,&nbsp;Guodong Guo","doi":"10.1016/j.imavis.2025.105568","DOIUrl":"10.1016/j.imavis.2025.105568","url":null,"abstract":"<div><div>Embodied visual computing seeks to learn from the real world, which requires an efficient machine learning methods. In conventional stochastic gradient descent (SGD) and its variants, the gradient estimators are expensive to be computed in many scenarios. This paper introduces a calibrated gradient descent (CGD) algorithm for efficient deep neural network optimization. A theorem is developed to prove that an unbiased estimator for network parameters can be obtained in a probabilistic way based on the Lipschitz hypothesis. We implement our CGD algorithm based on widely-used SGD and ADAM optimizers. We achieve a generic gradient calibration layer (<span><math><mrow><mi>G</mi><mi>C</mi><mi>l</mi><mi>a</mi><mi>y</mi><mi>e</mi><mi>r</mi></mrow></math></span>) which can be used to improve the performance of convolutional neural networks (C-CNNs). Our <span><math><mrow><mi>G</mi><mi>C</mi><mi>l</mi><mi>a</mi><mi>y</mi><mi>e</mi><mi>r</mi></mrow></math></span> only introduces extra parameters during training process, but not affect the efficiency of the inference process. Our method is generic and effective to optimize both CNNs and also quantized neural networks (C-QNNs). Extensive experimental results demonstrate that our method can achieve the state-of-the-art performance for a variety of tasks. For example, our 1-bit Faster-RCNN achieved by C-QNN obtains 20.5% mAP on COCO, leading to a new state-of-the-art performance. This work brings new insights for developing more efficient optimizers and analyzing the back-propagation algorithm.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"160 ","pages":"Article 105568"},"PeriodicalIF":4.2,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144083672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信