Make Large Language Models Efficient: A Review

IF 3.6 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Access Pub Date : 2025-09-02 DOI:10.1109/ACCESS.2025.3605110

Aman Mussa;Zhanseit Tuimebayev;Madina Mansurova

{"title":"Make Large Language Models Efficient: A Review","authors":"Aman Mussa;Zhanseit Tuimebayev;Madina Mansurova","doi":"10.1109/ACCESS.2025.3605110","DOIUrl":null,"url":null,"abstract":"Large Language Models (LLMs) have achieved remarkable success across a variety of natural language processing tasks, with larger architectures often exhibiting superior performance. This scaling behavior has fueled intense competition in generative AI, supported by projected investments that exceed <inline-formula> <tex-math>${\\$} $ </tex-math></inline-formula>1 trillion to develop increasingly sophisticated LLMs. This competition has in turn nurtured a vibrant ecosystem, inspiring new open-source models such as DeepSeek, and motivating application developers to harness state-of-the-art LLMs for real-world deployments. However, the extensive memory and computational requirements of large models present serious obstacles for small-medium organizations, leading to significant scalability concerns. This paper offers a comprehensive review of recent techniques to improve LLM efficiency through four categories: parameter-centric, architecture-centric, training-centric and data-centric. For a better understanding of the newcomer’s perspective, it covers the entire lifecycle when developing and deploying LLMs. Thus, this paper is organized around five core tasks: model compression for local deployment, accelerated pre-training to reduce time-to-train, efficient fine-tuning on custom data, optimized inference under resource constraints, and streamlined data preparation. Rather than focusing on broad strategies, we emphasize specialized techniques tailored to each stage of development. By applying targeted optimizations at each phase, the computational overhead can be reduced by 50–95% without compromising the quality of the model, making LLMs more accessible to researchers and practitioners with limited computational resources.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"154466-154490"},"PeriodicalIF":3.6000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11146704","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11146704/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Large Language Models (LLMs) have achieved remarkable success across a variety of natural language processing tasks, with larger architectures often exhibiting superior performance. This scaling behavior has fueled intense competition in generative AI, supported by projected investments that exceed

${\$} $

1 trillion to develop increasingly sophisticated LLMs. This competition has in turn nurtured a vibrant ecosystem, inspiring new open-source models such as DeepSeek, and motivating application developers to harness state-of-the-art LLMs for real-world deployments. However, the extensive memory and computational requirements of large models present serious obstacles for small-medium organizations, leading to significant scalability concerns. This paper offers a comprehensive review of recent techniques to improve LLM efficiency through four categories: parameter-centric, architecture-centric, training-centric and data-centric. For a better understanding of the newcomer’s perspective, it covers the entire lifecycle when developing and deploying LLMs. Thus, this paper is organized around five core tasks: model compression for local deployment, accelerated pre-training to reduce time-to-train, efficient fine-tuning on custom data, optimized inference under resource constraints, and streamlined data preparation. Rather than focusing on broad strategies, we emphasize specialized techniques tailored to each stage of development. By applying targeted optimizations at each phase, the computational overhead can be reduced by 50–95% without compromising the quality of the model, making LLMs more accessible to researchers and practitioners with limited computational resources.

查看原文本刊更多论文

使大型语言模型高效：综述

大型语言模型（llm）在各种自然语言处理任务中取得了显著的成功，更大的体系结构通常表现出更好的性能。这种扩展行为引发了生成式人工智能领域的激烈竞争，预计将有超过1万亿美元的投资用于开发日益复杂的llm。这场竞赛反过来培育了一个充满活力的生态系统，激发了新的开源模型，如DeepSeek，并激励应用程序开发人员利用最先进的llm进行实际部署。然而，大型模型的大量内存和计算需求给中小型组织带来了严重的障碍，导致了重大的可伸缩性问题。本文通过四个类别全面回顾了最近提高LLM效率的技术：以参数为中心、以架构为中心、以培训为中心和以数据为中心。为了更好地理解新人的观点，它涵盖了开发和部署llm时的整个生命周期。因此，本文围绕五个核心任务进行组织：用于本地部署的模型压缩、加速预训练以减少训练时间、对自定义数据进行有效微调、在资源约束下优化推理以及简化数据准备。我们强调针对每个发展阶段的专门技术，而不是侧重于广泛的战略。通过在每个阶段应用有针对性的优化，可以在不影响模型质量的情况下减少50-95%的计算开销，使llm更容易被计算资源有限的研究人员和从业者使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Access COMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

9.80

自引率

7.70%

发文量

6673

审稿时长

6 weeks

期刊介绍： IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest. IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on: Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals. Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering. Development of new or improved fabrication or manufacturing techniques. Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.