{"title":"Federated fine-grained prompts for vision-language models based on open-vocabulary object detection","authors":"Yu Li","doi":"10.1007/s10489-025-06527-w","DOIUrl":null,"url":null,"abstract":"<div><p>Vision-language models can be used for open-vocabulary object detection. The existing methods suffer from low matching accuracy between prompt and image regions, as well as limited generalization capability as they adopt a data-centralized model training approach that ignores data heterogeneity. To alleviate these issues, we propose a federated fine-grained prompts learning method called FFPLearning, for open-vocabulary object detection using vision-language models. Specifically, FFPLearning quantifies the quality of proposals using pre-fused EoG (Energy of Gradient) and IoU (Intersection over Union) scores and organizes them into individual groups. Then learnable fine-grained prompts are trained to align the grouped region proposals in the feature space. A momentum update algorithm is designed to assess the quality of each participating client in the federated learning. Additionally, a Transformer-based feedback aggregation algorithm is designed to thoroughly leverage the semantic information from prompts and aggregate them based on the qualities of clients. Comprehensive evaluations on COCO and LVIS datasets demonstrate that FFPLearning is very effective, with +5.8 Novel AP50 and +3.3 APr improvements compared with existing state-of-the-art methods.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 7","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06527-w","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Vision-language models can be used for open-vocabulary object detection. The existing methods suffer from low matching accuracy between prompt and image regions, as well as limited generalization capability as they adopt a data-centralized model training approach that ignores data heterogeneity. To alleviate these issues, we propose a federated fine-grained prompts learning method called FFPLearning, for open-vocabulary object detection using vision-language models. Specifically, FFPLearning quantifies the quality of proposals using pre-fused EoG (Energy of Gradient) and IoU (Intersection over Union) scores and organizes them into individual groups. Then learnable fine-grained prompts are trained to align the grouped region proposals in the feature space. A momentum update algorithm is designed to assess the quality of each participating client in the federated learning. Additionally, a Transformer-based feedback aggregation algorithm is designed to thoroughly leverage the semantic information from prompts and aggregate them based on the qualities of clients. Comprehensive evaluations on COCO and LVIS datasets demonstrate that FFPLearning is very effective, with +5.8 Novel AP50 and +3.3 APr improvements compared with existing state-of-the-art methods.
期刊介绍:
With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance.
The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.