QuFiH：混合低比特量化和块级参数高效微调视频哈希

IF 6.9 1区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Processing & Management Pub Date : 2025-09-25 DOI:10.1016/j.ipm.2025.104408

Fudong Li , Yuxiang Zhou , Qinglai Yang , Yong Chen , Dell Zhang , Xuelong Li

{"title":"QuFiH：混合低比特量化和块级参数高效微调视频哈希","authors":"Fudong Li , Yuxiang Zhou , Qinglai Yang , Yong Chen , Dell Zhang , Xuelong Li","doi":"10.1016/j.ipm.2025.104408","DOIUrl":null,"url":null,"abstract":"<div><div>Current video hashing methods lack large-model-based deep feature representation capabilities, resulting in suboptimal retrieval performance on large-scale video databases. To address this, we propose Video-HLM (Video-Hashing Large Model), built upon the pretrained Video-LLaMA. However, its computational demands hinder practical deployment. To balance efficiency and performance, we develop a novel video hashing framework that fuses hybrid low-bit <strong>Qu</strong>antization and block-level parameter-efficient <strong>Fi</strong>ne-tuning (QuFiH). Specifically, QuFiH employs <strong>4-bit quantization for the foundation model</strong> and <strong>2-bit quantization for the hash head</strong>, optimizing storage and performance. Additionally, we propose <strong>block-level LoRA/Propulsion</strong>, reducing redundant parameters while maintaining model expressiveness with lower computational overhead. Furthermore, we explore a <strong>“distill-then-finetune”</strong> strategy, combining knowledge distillation with the downstream task fine-tuning to enhance generalization and retrieval performance. Rich experiments on public datasets demonstrate QuFiH’s superiority. It compresses model parameters by <strong>6.84×</strong> while outperforming state-of-the-art hashing methods in mAP@100, even surpassing the full-precision Video-HLM. QuFiH achieves <strong>dual advancements in efficiency and accuracy</strong>, offering a practical solution for large-scale video retrieval in resource-constrained environments. <em>Code is available at:</em> <span><span><em>https://github.com/kydbj/QuFiH</em></span><svg><path></path></svg></span><em>.</em></div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104408"},"PeriodicalIF":6.9000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"QuFiH: Hybrid low-bit quantization and block-level parameter efficient fine-tuning for video hashing\",\"authors\":\"Fudong Li , Yuxiang Zhou , Qinglai Yang , Yong Chen , Dell Zhang , Xuelong Li\",\"doi\":\"10.1016/j.ipm.2025.104408\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Current video hashing methods lack large-model-based deep feature representation capabilities, resulting in suboptimal retrieval performance on large-scale video databases. To address this, we propose Video-HLM (Video-Hashing Large Model), built upon the pretrained Video-LLaMA. However, its computational demands hinder practical deployment. To balance efficiency and performance, we develop a novel video hashing framework that fuses hybrid low-bit <strong>Qu</strong>antization and block-level parameter-efficient <strong>Fi</strong>ne-tuning (QuFiH). Specifically, QuFiH employs <strong>4-bit quantization for the foundation model</strong> and <strong>2-bit quantization for the hash head</strong>, optimizing storage and performance. Additionally, we propose <strong>block-level LoRA/Propulsion</strong>, reducing redundant parameters while maintaining model expressiveness with lower computational overhead. Furthermore, we explore a <strong>“distill-then-finetune”</strong> strategy, combining knowledge distillation with the downstream task fine-tuning to enhance generalization and retrieval performance. Rich experiments on public datasets demonstrate QuFiH’s superiority. It compresses model parameters by <strong>6.84×</strong> while outperforming state-of-the-art hashing methods in mAP@100, even surpassing the full-precision Video-HLM. QuFiH achieves <strong>dual advancements in efficiency and accuracy</strong>, offering a practical solution for large-scale video retrieval in resource-constrained environments. <em>Code is available at:</em> <span><span><em>https://github.com/kydbj/QuFiH</em></span><svg><path></path></svg></span><em>.</em></div></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":\"63 2\",\"pages\":\"Article 104408\"},\"PeriodicalIF\":6.9000,\"publicationDate\":\"2025-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306457325003498\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325003498","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

目前的视频哈希方法缺乏基于大模型的深度特征表示能力，导致在大规模视频数据库中检索性能欠佳。为了解决这个问题，我们提出了基于预训练的Video-LLaMA的Video-HLM （Video-Hashing Large Model）模型。然而，它的计算需求阻碍了实际部署。为了平衡效率和性能，我们开发了一种新的视频哈希框架，融合了混合低比特量化和块级参数高效微调（QuFiH）。具体来说，QuFiH对基础模型采用4位量化，对哈希头采用2位量化，优化了存储和性能。此外，我们提出了块级LoRA/Propulsion，减少冗余参数，同时以较低的计算开销保持模型的表达性。此外，我们探索了一种“先提炼后微调”的策略，将知识蒸馏与下游任务微调相结合，以提高泛化和检索性能。在公共数据集上的大量实验证明了QuFiH的优越性。它将模型参数压缩了6.84倍，同时优于mAP@100中最先进的哈希方法，甚至超过了全精度Video-HLM。QuFiH实现了效率和准确性的双重提升，为资源受限环境下的大规模视频检索提供了实用的解决方案。代码可从https://github.com/kydbj/QuFiH获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

QuFiH: Hybrid low-bit quantization and block-level parameter efficient fine-tuning for video hashing

Current video hashing methods lack large-model-based deep feature representation capabilities, resulting in suboptimal retrieval performance on large-scale video databases. To address this, we propose Video-HLM (Video-Hashing Large Model), built upon the pretrained Video-LLaMA. However, its computational demands hinder practical deployment. To balance efficiency and performance, we develop a novel video hashing framework that fuses hybrid low-bit Quantization and block-level parameter-efficient Fine-tuning (QuFiH). Specifically, QuFiH employs 4-bit quantization for the foundation model and 2-bit quantization for the hash head, optimizing storage and performance. Additionally, we propose block-level LoRA/Propulsion, reducing redundant parameters while maintaining model expressiveness with lower computational overhead. Furthermore, we explore a “distill-then-finetune” strategy, combining knowledge distillation with the downstream task fine-tuning to enhance generalization and retrieval performance. Rich experiments on public datasets demonstrate QuFiH’s superiority. It compresses model parameters by 6.84× while outperforming state-of-the-art hashing methods in mAP@100, even surpassing the full-precision Video-HLM. QuFiH achieves dual advancements in efficiency and accuracy, offering a practical solution for large-scale video retrieval in resource-constrained environments. Code is available at: https://github.com/kydbj/QuFiH.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Processing & Management 工程技术-计算机：信息系统

CiteScore

17.00

自引率

11.60%

发文量

276

审稿时长

39 days

期刊介绍： Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.