{"title":"Efficient Compressing and Tuning Methods for Large Language Models: A Systematic Literature Review","authors":"Gun Il Kim, Sunga Hwang, Beakcheol Jang","doi":"10.1145/3728636","DOIUrl":null,"url":null,"abstract":"Efficient compression and tuning techniques have become indispensable in addressing the increasing computational and memory demands of large language models (LLMs). While these models have demonstrated exceptional performance across a wide range of natural language processing tasks, their growing size and resource requirements pose significant challenges to accessibility and sustainability. This survey systematically reviews state-of-the-art methods in model compression, including compression techniques such as knowledge distillation, low-rank approximation, parameter pruning, and quantization, as well as tuning techniques such as parameter-efficient fine-tuning and inference optimization. Compression techniques, though well-established in traditional deep learning, require updated methodologies tailored to the scale and dynamics of LLMs. Simultaneously, parameter-efficient fine-tuning, exemplified by techniques like Low-Rank Adaptation (LoRA) and query tuning, emerges as a promising solution for adapting models with minimal resource overhead. This study provides a detailed taxonomy of these methods, examining their practical applications, strengths, and limitations. Critical gaps are identified in scalability, and the integration of compression and tuning strategies, signaling the need for unified frameworks and hybrid approaches to maximize efficiency and performance. By addressing these challenges, this survey aims to guide researchers toward sustainable, efficient, and accessible LLM development, ensuring their broader applicability across diverse domains while mitigating resource constraints.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"4 1","pages":""},"PeriodicalIF":23.8000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Computing Surveys","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3728636","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Efficient compression and tuning techniques have become indispensable in addressing the increasing computational and memory demands of large language models (LLMs). While these models have demonstrated exceptional performance across a wide range of natural language processing tasks, their growing size and resource requirements pose significant challenges to accessibility and sustainability. This survey systematically reviews state-of-the-art methods in model compression, including compression techniques such as knowledge distillation, low-rank approximation, parameter pruning, and quantization, as well as tuning techniques such as parameter-efficient fine-tuning and inference optimization. Compression techniques, though well-established in traditional deep learning, require updated methodologies tailored to the scale and dynamics of LLMs. Simultaneously, parameter-efficient fine-tuning, exemplified by techniques like Low-Rank Adaptation (LoRA) and query tuning, emerges as a promising solution for adapting models with minimal resource overhead. This study provides a detailed taxonomy of these methods, examining their practical applications, strengths, and limitations. Critical gaps are identified in scalability, and the integration of compression and tuning strategies, signaling the need for unified frameworks and hybrid approaches to maximize efficiency and performance. By addressing these challenges, this survey aims to guide researchers toward sustainable, efficient, and accessible LLM development, ensuring their broader applicability across diverse domains while mitigating resource constraints.
期刊介绍:
ACM Computing Surveys is an academic journal that focuses on publishing surveys and tutorials on various areas of computing research and practice. The journal aims to provide comprehensive and easily understandable articles that guide readers through the literature and help them understand topics outside their specialties. In terms of impact, CSUR has a high reputation with a 2022 Impact Factor of 16.6. It is ranked 3rd out of 111 journals in the field of Computer Science Theory & Methods.
ACM Computing Surveys is indexed and abstracted in various services, including AI2 Semantic Scholar, Baidu, Clarivate/ISI: JCR, CNKI, DeepDyve, DTU, EBSCO: EDS/HOST, and IET Inspec, among others.