Applied Intelligence最新文献

筛选
英文 中文
DIMCAR: dynamic intent modeling and context-aware recommendations in sparse data environment towards next basket prediction DIMCAR:稀疏数据环境下下一个篮预测的动态意图建模和上下文感知建议
IF 3.5 2区 计算机科学
Applied Intelligence Pub Date : 2025-10-03 DOI: 10.1007/s10489-025-06796-5
John Kingsley Arthur, Conghua Zhou, Xiang-Jun Shen, Ronky Wrancis Amber-Doh, Eric Appiah Mantey, Jeremiah Osei-Kwakye
{"title":"DIMCAR: dynamic intent modeling and context-aware recommendations in sparse data environment towards next basket prediction","authors":"John Kingsley Arthur,&nbsp;Conghua Zhou,&nbsp;Xiang-Jun Shen,&nbsp;Ronky Wrancis Amber-Doh,&nbsp;Eric Appiah Mantey,&nbsp;Jeremiah Osei-Kwakye","doi":"10.1007/s10489-025-06796-5","DOIUrl":"10.1007/s10489-025-06796-5","url":null,"abstract":"<div><p>In the fast-changing world of e-commerce, the success of recommender systems is crucial for boosting user engagement and increasing sales. Conventional models often struggle with evolving user preferences and data sparsity, hindering accurate predictions. Existing Graph-based regularization mechanisms and deep learning approaches address these challenges but remain sensitive to noise and computational complexity, limiting their effectiveness in large-scale, real-time settings. We propose a novel multi-layered Next Basket Recommender System called dynamic intent modelling and context-aware recommendation (DIMCAR) model to overcome these limitations. First, we resolve the data sparsity problem by constructing a novel optimized Graph Sparse Regularization framework for Non-negative Matrix Factorization (OGSR-NMF) framework integrating a time-varying graph structure, a novel hybrid sparsity norm, a modified Proximal Alternating Linearized Minimization (mPALM). Additionally, we dynamically model user intents and context using attention mechanisms and Gated Recurrent Units (GRUs). Finally, we integrate a novel Adaptive Reptile Basket Optimization Algorithm into a Deep Convolutional Neural Network, enhancing the model's adaptability to changing user behaviours in real time. Theoretical analysis and experiments on four benchmark datasets demonstrate that DIMCAR outperforms existing models in recommendation accuracy and user satisfaction.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EGPT-SPE: story point effort estimation using improved GPT-2 by removing inefficient attention heads EGPT-SPE:通过移除低效的注意力头,使用改进的GPT-2进行故事点工作量估计
IF 3.5 2区 计算机科学
Applied Intelligence Pub Date : 2025-10-02 DOI: 10.1007/s10489-025-06824-4
Amna Shahid Cheemaa, Muhammad Azhar, Fahim Arif, Qazi Mazhar ul haq, Muhammad Sohail, Asma Iqbal
{"title":"EGPT-SPE: story point effort estimation using improved GPT-2 by removing inefficient attention heads","authors":"Amna Shahid Cheemaa,&nbsp;Muhammad Azhar,&nbsp;Fahim Arif,&nbsp;Qazi Mazhar ul haq,&nbsp;Muhammad Sohail,&nbsp;Asma Iqbal","doi":"10.1007/s10489-025-06824-4","DOIUrl":"10.1007/s10489-025-06824-4","url":null,"abstract":"<div><p>Estimating story points from user requirements is crucial in the Software Development Life Cycle (SDLC) as it impacts resource allocation and timelines; inaccuracies can lead to missed deadlines and increased costs, harming a company’s reputation. While various techniques have emerged to automate this process, conventional machine learning methods often fail to understand the context of user requirements, and deep learning approaches face high computational costs. To address these issues, the Efficient GPT for Story Point Estimation (EGPT-SPE) algorithm optimizes the Multi-Head Attention module by removing inefficient heads, enhancing accuracy and reducing costs. Experiments on the Choetkiertikul dataset (23,313 issues across 16 open-source projects) and the TAWOS dataset (458,232 issues across 39 open-source projects from 12 public JIRA repositories) demonstrated a 5 to 15 percent accuracy improvement in both within-project and cross-project estimations, validating the algorithm’s effectiveness in agile story point estimation.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RXNet: cross-modality person re-identification based on a dual-branch network RXNet:基于双分支网络的跨模态人员再识别
IF 3.5 2区 计算机科学
Applied Intelligence Pub Date : 2025-10-01 DOI: 10.1007/s10489-025-06501-6
Weiyang Zhang, Jiong Guo, Qiang Liu, Maoyang Zou, Honggang Chen, Jing Peng
{"title":"RXNet: cross-modality person re-identification based on a dual-branch network","authors":"Weiyang Zhang,&nbsp;Jiong Guo,&nbsp;Qiang Liu,&nbsp;Maoyang Zou,&nbsp;Honggang Chen,&nbsp;Jing Peng","doi":"10.1007/s10489-025-06501-6","DOIUrl":"10.1007/s10489-025-06501-6","url":null,"abstract":"<div><p>The goal of text-based person re-identification (TI-ReID) is to match individuals using various methods by integrating information from both images and text. TI-ReID encounters significant challenges because of the clear differences in features between images and textual descriptions. Contemporary techniques commonly utilize a method that merges general and specific characteristics to obtain more detailed feature representations. However, these techniques depend on additional models for estimating or segmenting human poses to determine local characteristics, making it challenging to apply them in practice. To solve this problem, we propose a dual-path network based on RegNet and XLNet for TI-ReID (RXNet). In the image segment, RegNet is employed to acquire multitiered semantic image attributes and dynamically assimilate distinct local features through visual focus. In the text segment, XLNet is utilized, to extract significant semantic attributes from the text via a two-way encoding system based on an autoregressive model. Furthermore, to increase the efficacy of our model, we develop both residual triplet attention and dual attention to align features across different modalities. Additionally, we replace cross-entropy ID loss with smoothing ID loss to prevent overfitting while improving the efficiency of the model. Experimental results on the CUHK-PEDES dataset show that the proposed method achieves a rank-1/mAP accuracy of 85.49%/73.40%, outperforming the current state-of-the-art methods by a large margin.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep learning techniques for point cloud tasks: a review 点云任务的深度学习技术:综述
IF 3.5 2区 计算机科学
Applied Intelligence Pub Date : 2025-09-30 DOI: 10.1007/s10489-025-06854-y
Xiaona Song, Haozhe Zhang, Lijun Wang, Jinxing Niu, Ying Zhu, Junjie Nian, Ruixue Cheng
{"title":"Deep learning techniques for point cloud tasks: a review","authors":"Xiaona Song,&nbsp;Haozhe Zhang,&nbsp;Lijun Wang,&nbsp;Jinxing Niu,&nbsp;Ying Zhu,&nbsp;Junjie Nian,&nbsp;Ruixue Cheng","doi":"10.1007/s10489-025-06854-y","DOIUrl":"10.1007/s10489-025-06854-y","url":null,"abstract":"<div><p>As a significant means of representing 3D scenes, point clouds are extensively utilized in various fields Such as computer vision, autonomous driving, robotic interaction, and urban modeling. While deep learning has achieved remarkable Success in the realm of two-dimensional images, and its application to three-dimensional point clouds is also progressively gaining traction. However, the irregular and unstructured nature of point cloud data presents numerous challenges when applying deep learning algorithms to these 3D representations. To foster future research endeavors, this paper concentrates on three fundamental tasks associated with point clouds: classification, object detection, and semantic segmentation. It systematically reviews the current state of development regarding deep learning algorithms pertinent to these tasks. By organizing and analyzing existing literature alongside experimental results derived from publicly available datasets, this paper compares the strengths of different methodologies while also highlighting their limitations. Ultimately, it summarizes the technical challenges encountered in advancing deep learning algorithms for point clouds and outlines potential avenues for progress within this domain.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Balancing act: engagement detection in online learning through master-assistant models with an enhanced hierarchical attention mechanism 平衡行为:通过具有增强的分层注意机制的主-助理模型在在线学习中的参与检测
IF 3.5 2区 计算机科学
Applied Intelligence Pub Date : 2025-09-30 DOI: 10.1007/s10489-025-06893-5
Tingting Han, Ruqian Liu, Shuwei Dou, Wei Wang, Xiaoming Ding, Wenxia Zhang, Jihao Lang, Wenxuan Li, Jixing Han
{"title":"Balancing act: engagement detection in online learning through master-assistant models with an enhanced hierarchical attention mechanism","authors":"Tingting Han,&nbsp;Ruqian Liu,&nbsp;Shuwei Dou,&nbsp;Wei Wang,&nbsp;Xiaoming Ding,&nbsp;Wenxia Zhang,&nbsp;Jihao Lang,&nbsp;Wenxuan Li,&nbsp;Jixing Han","doi":"10.1007/s10489-025-06893-5","DOIUrl":"10.1007/s10489-025-06893-5","url":null,"abstract":"<div><p>The rapid expansion of online learning calls for the establishment of effective approaches to monitor and boost student engagement, which constitutes a key element influencing learning outcomes. The class imbalances within engagement datasets pose substantial challenges to precise detection and classification. Existing methods for detecting student engagement in online learning adopt weighted loss to address the issue of class imbalance in public datasets. However, due to the challenge of selecting appropriate weights and the risk of overfitting, the effectiveness of this approach often relies on extensive experiments for manual adjustments. To tackle this problem, we propose a Master-Assistant model to address the performance degradation caused by class imbalance to ensure effective detection of student engagement. The Assistant model is designed for coarse-grained classification according to different assistant strategies to assist the Master model for fine-grained classification. Furthermore, we extract multiple engagement-related handcrafted features and assigned different weights via an enhanced hierarchical attention mechanism. Finally, an accuracy of 70.69% and an F1-score of 68% are achieved on the Dataset for Affective States in E-Environments (DAiSEE), setting new state-of-the-art (SOTA) scores. Additionally, experiments on three other imbalanced datasets also validate the robustness of the Master-Assistant model in solving the class imbalance problem.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection method for improving shape perception of small object defects on metal surfaces 改进金属表面小物体缺陷形状感知的检测方法
IF 3.5 2区 计算机科学
Applied Intelligence Pub Date : 2025-09-29 DOI: 10.1007/s10489-025-06873-9
Xingfei Zhu, Christophe Montagne, Qimeng Wang, Lingxiang Hu, Jinghu Yu, Hedi Tabia, Qianqian Hu
{"title":"Detection method for improving shape perception of small object defects on metal surfaces","authors":"Xingfei Zhu,&nbsp;Christophe Montagne,&nbsp;Qimeng Wang,&nbsp;Lingxiang Hu,&nbsp;Jinghu Yu,&nbsp;Hedi Tabia,&nbsp;Qianqian Hu","doi":"10.1007/s10489-025-06873-9","DOIUrl":"10.1007/s10489-025-06873-9","url":null,"abstract":"<div><p>Defects on metal surfaces often exhibit complexity with diverse shapes, small sizes, and irregular patterns, leading to frequent missed and false detections during inspection and posing significant challenges to automated detection systems. Existing advanced object detectors, when applied directly to small defect detection on metal surfaces, fail to achieve satisfactory results. To mitigate these issues, we proposed a detection method to enhance the shape perception of small object defects on metal surfaces, namely MetalYOLO. Firstly, a novel location-aware attention mechanism is designed to integrate deformable convolutions to form a new feature selection module to enhance the focus on key defect features, optimizes the generation of offsets, and improve the model’s ability to adapt to complex shape objects. Secondly, the standard up-sampling module is replaced with a dynamic sampling module to dynamically adjust the sampling pattern of the input feature distribution to improve computational efficiency and retain complex or small-scale object features, thereby improving detection accuracy. Finally, a new detail-enhanced detection head is designed to further improve the network’s ability to capture fine-grained details by introducing a detail-enhanced attention-sharing module so as to utilize contextual information to selectively suppress irrelevant features, thereby reducing information redundancy. The proposed model is compared with baseline models on the ILS-MB and NEU-DET datasets. and the experimental results show significant improvements in false detection and missed detection rates with only a slight loss in inference speed. Meanwhile, the mAP reached 80.4% and 79.0%, respectively, which is 1.7% and 3.2% higher than the baseline algorithm.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TransMambaCC: Integrating Transformer and Pyramid Mamba Network for RGB-T Crowd Counting TransMambaCC:集成变压器和金字塔曼巴网络的RGB-T人群计数
IF 3.5 2区 计算机科学
Applied Intelligence Pub Date : 2025-09-29 DOI: 10.1007/s10489-025-06912-5
Yangjian Chen, Huailin Zhao, Liangjun Huang, Yubo Yang, Wencan Kang, Jianwei Zhang
{"title":"TransMambaCC: Integrating Transformer and Pyramid Mamba Network for RGB-T Crowd Counting","authors":"Yangjian Chen,&nbsp;Huailin Zhao,&nbsp;Liangjun Huang,&nbsp;Yubo Yang,&nbsp;Wencan Kang,&nbsp;Jianwei Zhang","doi":"10.1007/s10489-025-06912-5","DOIUrl":"10.1007/s10489-025-06912-5","url":null,"abstract":"<div><p>RGB-T crowd counting is a challenging task that integrates RGB and thermal images to address the limitations of RGB-only approaches in scenes with poor illumination or occlusion. While transformer-based models have shown remarkable success in terms of capturing long-range dependencies, their high computational demands limit their practical applicability. To address this issue, a novel hybrid model named TransMambaCC, which fuses the analytical strength of transformer with the computational efficiency of Mamba, is proposed. This integration not only improves crowd analysis performance, but also significantly reduces computational overhead of the model. Additionally, a Pyramid Mamba module is innovatively designed to address the head-scale variations observed in congested scenes. Extensive experiments conducted on the RGBT-CC dataset demonstrate the superiority of TransMambaCC over the existing approaches in terms of both accuracy and efficiency. Furthermore, the model exhibits strong generalization capabilities, as evidenced by its performance on the ShanghaiTechRGBD dataset. The code is available at https://github.com/yjchen3250/TransMambaCC.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing vitiligo stage diagnosis through a reliable multimodal model with uncertainty calibration 通过不确定校正的可靠多模态模型提高白癜风分期诊断
IF 3.5 2区 计算机科学
Applied Intelligence Pub Date : 2025-09-29 DOI: 10.1007/s10489-025-06839-x
Zhiming Li, Shuying Jiang, Fan Xiang, Chunying Li, Shuli Li, Tianwen Gao, Kaiqiao He, Jianru Chen, Junpeng Zhang, Junran Zhang
{"title":"Enhancing vitiligo stage diagnosis through a reliable multimodal model with uncertainty calibration","authors":"Zhiming Li,&nbsp;Shuying Jiang,&nbsp;Fan Xiang,&nbsp;Chunying Li,&nbsp;Shuli Li,&nbsp;Tianwen Gao,&nbsp;Kaiqiao He,&nbsp;Jianru Chen,&nbsp;Junpeng Zhang,&nbsp;Junran Zhang","doi":"10.1007/s10489-025-06839-x","DOIUrl":"10.1007/s10489-025-06839-x","url":null,"abstract":"<div><p>Vitiligo is a common dermatological disease featuring hypopigmentation. Accurate staging of vitiligo is crucial for enhancing treatment efficacy. However, traditional diagnostic methods, which rely on physicians' subjective judgments, are time-consuming, labor-intensive, and prone to misdiagnosis. Recently, AI-powered multimodal dermatological classification models have demonstrated significant potential in this area. But the credibility of these models at the decision-making stage is an area that requires further refinement. This study proposes a multimodal disease staging diagnostic model with uncertainty calibration to analyze multimodal samples from three stages of vitiligo. The model innovatively extracts feature information from various modalities and transforms it into a Dirichlet distribution to assess sample uncertainty. Then, the Dempster—Shafer theory is used to fuse evidence from different modalities, generating a final diagnostic result and an uncertainty score. Additionally, an uncertainty—based loss function is designed. And by using an uncertainty threshold method, the model can detect high—uncertainty samples that require additional judgment, effectively reducing the risk of misdiagnosis and missed diagnosis. Experimental results show that this model outperforms existing methods in terms of accuracy, precision, recall, and F1—score. Anomaly detection and noise—resistance experiments verify the model's robustness in handling unknown and noisy data. This model offers a new approach for AI-assisted vitiligo diagnosis, which can assist doctors in making more accurate diagnostic decisions, contribute to improving treatment efficiency.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prototypes guided model transformations between personalization and generalization in federated learning 原型指导联邦学习中个性化和泛化之间的模型转换
IF 3.5 2区 计算机科学
Applied Intelligence Pub Date : 2025-09-29 DOI: 10.1007/s10489-025-06566-3
Yuan Xi, Qiong Li, HaoKun Mao
{"title":"Prototypes guided model transformations between personalization and generalization in federated learning","authors":"Yuan Xi,&nbsp;Qiong Li,&nbsp;HaoKun Mao","doi":"10.1007/s10489-025-06566-3","DOIUrl":"10.1007/s10489-025-06566-3","url":null,"abstract":"<div><p>Federated Learning (FL) has gained popularity due to its ability to train a collaborative model while preserving privacy. However, it still faces limitations when dealing with heterogeneous data, primarily manifesting as the performance degradation of the global model and the inadaptability of the single global model to the divergence of client data distributions. Although the above issues are summarized by researchers as goals for generalization and personalization, few studies have simultaneously addressed both goals, with most prioritizing one over the other. In this paper, it is demonstrated that the FL iteration already incorporates model transformations between personalization and generalization, with a focus on ensuring the smooth functionality of these transformations under high data heterogeneity. Specifically, a novel Federated Prototype Transformation Framework (FedPT) is proposed, which is capable of generating a well-performing generalized model as well as personalized models simultaneously. FedPT constructs local prototype classifiers that explicitly guide personalized model optimization during local training, and these can be aggregated into a global prototype classifier suitable for generic tasks. The momentum update design retains the global knowledge in local training and aligns features between clients, which results in a smoother iteration. Moreover, an improved sample-level contrastive loss is presented to dig into deeper representations, achieving high-quality prototype generation even for missing or imbalanced classes. Experimental results demonstrate the exceptional performance of FedPT in both generalization and personalization tasks, outperforming latest methods.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visual residual aggregation network for visual-language prompt tuning 用于视觉语言提示调优的视觉残差聚合网络
IF 3.5 2区 计算机科学
Applied Intelligence Pub Date : 2025-09-29 DOI: 10.1007/s10489-025-06866-8
Yunqian Yu, Feng Guo, Xianlong Tian, Biao Chen, Mengmeng Jing, Lin Zuo
{"title":"Visual residual aggregation network for visual-language prompt tuning","authors":"Yunqian Yu,&nbsp;Feng Guo,&nbsp;Xianlong Tian,&nbsp;Biao Chen,&nbsp;Mengmeng Jing,&nbsp;Lin Zuo","doi":"10.1007/s10489-025-06866-8","DOIUrl":"10.1007/s10489-025-06866-8","url":null,"abstract":"<div><p>Prompt tuning leverages a series of learnable prompts to effectively guide pre-trained visual language models (VLMs) to adapt to various downstream tasks. VLMs encode deep features from both visual and textual branches and learn the joint embedding space of the two modalities by optimizing the contrast loss. However, existing prompt tuning methods face two critical challenges: (1) One challenge is the forgetting of generalized knowledge. As features propagate through the visual encoder, generalizable knowledge captured in shallow layers is gradually lost, ultimately impairing the generalization ability of the joint embedding space for new classes. (2) The other challenge is that models trained on the base class suffer from semantic bias. To address these issues, we propose <b><u>V</u></b>isual <b><u>R</u></b>esidual <b><u>A</u></b>ggregation Network for Visual-Language <b><u>P</u></b>rompt <b><u>T</u></b>uning (VraPT). VraPT comprises two sequentially connected components: a residual aggregation module and a semantic consistency module. Firstly, in order to solve the problem of generalized knowledge forgetting, the residual aggregation module enables adaptive fusion of generalized features, which effectively preserves generalized knowledge. It also reveals the importance of shallow features in enhancing the generalization capability of text prompts. The fused representation is then fed into the semantic consistency module which is used to address the problem of semantic bias. By minimizing the divergence from the true semantic distribution, this module enhances the semantic representations in the visual space as well as the semantic coherence of the learnable prompts. Our method enables the learned prompts to retain both discriminative semantic information and generalized knowledge. Extensive experiments show that our proposed VraPT is an effective prompt tuning method, especially in recognizing new classes with great improvement. On average, VraPT improves the accuracy on base classes by 1.06% and on new classes by 2.63% across 11 datasets, along with a 1.91% gain in the harmonic mean (H) metric.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信