Temporal Feature Matters: A Framework for Diffusion Model Quantization

IF 18.6

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-07-03 DOI:10.1109/TPAMI.2025.3585692

Yushi Huang;Ruihao Gong;Xianglong Liu;Jing Liu;Yuhang Li;Jiwen Lu;Dacheng Tao

{"title":"Temporal Feature Matters: A Framework for Diffusion Model Quantization","authors":"Yushi Huang;Ruihao Gong;Xianglong Liu;Jing Liu;Yuhang Li;Jiwen Lu;Dacheng Tao","doi":"10.1109/TPAMI.2025.3585692","DOIUrl":null,"url":null,"abstract":"Diffusion models, widely used for image generation, face significant challenges related to their broad applicability due to prolonged inference times and high memory demands. Efficient Post-Training Quantization (PTQ) is crucial to address these issues. However, unlike traditional models, diffusion models critically rely on the time-step for the multi-round denoising. Typically, each time-step is encoded into a hypersensitive temporal feature by several modules. Despite this, existing PTQ methods do not optimize these modules individually. Instead, they employ unsuitable reconstruction objectives and complex calibration methods, leading to significant disturbances in the temporal feature and denoising trajectory, as well as reduced compression efficiency. To address these challenges, we introduce a novel quantization framework that includes three strategies: 1) <italic>TIB-based Maintenance</i>: Based on our innovative Temporal Information Block (TIB) definition, Temporal Information-aware Reconstruction (TIAR) and Finite Set Calibration (FSC) are developed to efficiently align original temporal features. 2) <italic>Cache-based Maintenance</i>: Instead of indirect and complex optimization for the related modules, pre-computing and caching quantized counterparts of temporal features are developed to minimize errors. 3) <italic>Disturbance-aware Selection</i>: Employ temporal feature errors to guide a fine-grained selection between the two maintenance strategies for further disturbance reduction. This framework preserves most of the temporal information and ensures high-quality end-to-end generation. Extensive testing on various datasets, diffusion models and hardware confirms our superior performance and acceleration.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 10","pages":"8823-8837"},"PeriodicalIF":18.6000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11068163/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Diffusion models, widely used for image generation, face significant challenges related to their broad applicability due to prolonged inference times and high memory demands. Efficient Post-Training Quantization (PTQ) is crucial to address these issues. However, unlike traditional models, diffusion models critically rely on the time-step for the multi-round denoising. Typically, each time-step is encoded into a hypersensitive temporal feature by several modules. Despite this, existing PTQ methods do not optimize these modules individually. Instead, they employ unsuitable reconstruction objectives and complex calibration methods, leading to significant disturbances in the temporal feature and denoising trajectory, as well as reduced compression efficiency. To address these challenges, we introduce a novel quantization framework that includes three strategies: 1) TIB-based Maintenance: Based on our innovative Temporal Information Block (TIB) definition, Temporal Information-aware Reconstruction (TIAR) and Finite Set Calibration (FSC) are developed to efficiently align original temporal features. 2) Cache-based Maintenance: Instead of indirect and complex optimization for the related modules, pre-computing and caching quantized counterparts of temporal features are developed to minimize errors. 3) Disturbance-aware Selection: Employ temporal feature errors to guide a fine-grained selection between the two maintenance strategies for further disturbance reduction. This framework preserves most of the temporal information and ensures high-quality end-to-end generation. Extensive testing on various datasets, diffusion models and hardware confirms our superior performance and acceleration.

查看原文本刊更多论文

时间特征问题：扩散模型量化的框架。

广泛用于图像生成的扩散模型，由于其较长的推理时间和高内存需求而面临着与广泛适用性相关的重大挑战。有效的训练后量化（PTQ）是解决这些问题的关键。然而，与传统模型不同，扩散模型严重依赖于时间步长进行多轮去噪。通常，每个时间步被几个模块编码成一个超敏感的时间特征。尽管如此，现有的PTQ方法并没有单独优化这些模块。相反，它们采用了不合适的重建目标和复杂的校准方法，导致时间特征和去噪轨迹受到严重干扰，降低了压缩效率。为了应对这些挑战，我们引入了一种新的量化框架，包括三种策略：1)基于TIB的维护：基于我们创新的时间信息块（TIB）定义，开发了时间信息感知重构（TIAR）和有限集校准（FSC）来有效地对齐原始时间特征。2)基于缓存的维护：不再对相关模块进行间接的、复杂的优化，而是对时间特征进行预计算和缓存量化，以最小化误差。3)干扰感知选择：利用时间特征误差来指导两种维护策略之间的细粒度选择，以进一步减少干扰。该框架保留了大部分临时信息，并确保了高质量的端到端生成。在各种数据集、扩散模型和硬件上的广泛测试证实了我们卓越的性能和加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量