PFedLAH：面向自适应跨模态哈希的前瞻性个性化联邦学习

IF 11.1 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-12 DOI:10.1109/TCSVT.2025.3550794

Yunfei Chen;Hongyu Lin;Zhan Yang;Jun Long

{"title":"PFedLAH：面向自适应跨模态哈希的前瞻性个性化联邦学习","authors":"Yunfei Chen;Hongyu Lin;Zhan Yang;Jun Long","doi":"10.1109/TCSVT.2025.3550794","DOIUrl":null,"url":null,"abstract":"Cross-modal hashing enables efficient cross-modal retrieval by compressing multi-modal data into compact binary codes, but traditional methods primarily rely on centralized training, which is limited when handling large-scale distributed datasets. Federated learning presents a scalable alternative, yet existing federated frameworks for cross-modal hashing face challenges like data heterogeneity and imbalance, such as non-IID data distribution across clients. To address these challenges, we propose Personalized Federated learning with Lookahead for Adaptive cross-modal Hashing (PFedLAH) method, which combines Feature Adaptive Personalized Learning (FAPL) and Weight-aware Lookahead Adaptive Selection (WLAS) mechanism together. Initially, the FAPL module is designed for the client, enabling personalized learning to mitigate the effect of divergence between server and client resulting from non-IID data distribution, while the local optimization constraint mechanism is also integrated to avoid local optimization shift and ensure better alignment with global convergence. On the server side, WLAS module combines weight-aware adaptive client selection and gradient momentum lookahead to form a dynamic and intelligent client selection scheme, while enhancing the overall convergence and consistency through lookahead gradient prediction. Comprehensive experiments on widely used datasets, including MIRFlickr-25K, MS COCO, and NUS-WIDE, comparing state-of-the-art federated hashing methods, demonstrate the superior retrieval performance, robustness, and scalability of the PFedLAH method.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 8","pages":"8359-8371"},"PeriodicalIF":11.1000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PFedLAH: Personalized Federated Learning With Lookahead for Adaptive Cross-Modal Hashing\",\"authors\":\"Yunfei Chen;Hongyu Lin;Zhan Yang;Jun Long\",\"doi\":\"10.1109/TCSVT.2025.3550794\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cross-modal hashing enables efficient cross-modal retrieval by compressing multi-modal data into compact binary codes, but traditional methods primarily rely on centralized training, which is limited when handling large-scale distributed datasets. Federated learning presents a scalable alternative, yet existing federated frameworks for cross-modal hashing face challenges like data heterogeneity and imbalance, such as non-IID data distribution across clients. To address these challenges, we propose Personalized Federated learning with Lookahead for Adaptive cross-modal Hashing (PFedLAH) method, which combines Feature Adaptive Personalized Learning (FAPL) and Weight-aware Lookahead Adaptive Selection (WLAS) mechanism together. Initially, the FAPL module is designed for the client, enabling personalized learning to mitigate the effect of divergence between server and client resulting from non-IID data distribution, while the local optimization constraint mechanism is also integrated to avoid local optimization shift and ensure better alignment with global convergence. On the server side, WLAS module combines weight-aware adaptive client selection and gradient momentum lookahead to form a dynamic and intelligent client selection scheme, while enhancing the overall convergence and consistency through lookahead gradient prediction. Comprehensive experiments on widely used datasets, including MIRFlickr-25K, MS COCO, and NUS-WIDE, comparing state-of-the-art federated hashing methods, demonstrate the superior retrieval performance, robustness, and scalability of the PFedLAH method.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 8\",\"pages\":\"8359-8371\"},\"PeriodicalIF\":11.1000,\"publicationDate\":\"2025-03-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10924221/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10924221/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

跨模态哈希通过将多模态数据压缩成紧凑的二进制代码来实现高效的跨模态检索，但传统方法主要依赖于集中训练，这在处理大规模分布式数据集时受到限制。联邦学习提供了一种可伸缩的替代方案，但是现有的跨模态散列的联邦框架面临着数据异构和不平衡等挑战，例如跨客户机的非iid数据分布。为了解决这些挑战，我们提出了结合特征自适应个性化学习（FAPL）和权重感知前瞻自适应选择（WLAS）机制的面向自适应跨模散列的个性化联邦学习（PFedLAH）方法。FAPL模块最初是为客户端设计的，实现了个性化学习，缓解了非iid数据分布导致的服务器和客户端之间的差异影响，同时集成了局部优化约束机制，避免了局部优化偏移，确保更好地与全局收敛保持一致。在服务器端，WLAS模块将权重感知自适应客户端选择与梯度动量前瞻相结合，形成动态智能的客户端选择方案，同时通过梯度前瞻预测增强整体的收敛性和一致性。在广泛使用的数据集（包括MIRFlickr-25K、MS COCO和NUS-WIDE）上进行综合实验，比较了最先进的联邦散列方法，证明了PFedLAH方法优越的检索性能、鲁棒性和可扩展性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

PFedLAH: Personalized Federated Learning With Lookahead for Adaptive Cross-Modal Hashing

Cross-modal hashing enables efficient cross-modal retrieval by compressing multi-modal data into compact binary codes, but traditional methods primarily rely on centralized training, which is limited when handling large-scale distributed datasets. Federated learning presents a scalable alternative, yet existing federated frameworks for cross-modal hashing face challenges like data heterogeneity and imbalance, such as non-IID data distribution across clients. To address these challenges, we propose Personalized Federated learning with Lookahead for Adaptive cross-modal Hashing (PFedLAH) method, which combines Feature Adaptive Personalized Learning (FAPL) and Weight-aware Lookahead Adaptive Selection (WLAS) mechanism together. Initially, the FAPL module is designed for the client, enabling personalized learning to mitigate the effect of divergence between server and client resulting from non-IID data distribution, while the local optimization constraint mechanism is also integrated to avoid local optimization shift and ensure better alignment with global convergence. On the server side, WLAS module combines weight-aware adaptive client selection and gradient momentum lookahead to form a dynamic and intelligent client selection scheme, while enhancing the overall convergence and consistency through lookahead gradient prediction. Comprehensive experiments on widely used datasets, including MIRFlickr-25K, MS COCO, and NUS-WIDE, comparing state-of-the-art federated hashing methods, demonstrate the superior retrieval performance, robustness, and scalability of the PFedLAH method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.