{"title":"Exploring Posttraining Quantization of Large Language Models: An Efficiency Evaluation with a Focus on Russian-Language Tasks","authors":"D. R. Poimanov, M. S. Shutov","doi":"10.3103/S0005105525701389","DOIUrl":null,"url":null,"abstract":"<p>Quantization has become a key technique for the compression and acceleration of large language models (LLMs). Although research into low-bit quantization is actively advancing for English-language LLMs, its impact on morphologically rich and resource-diverse languages, including Russian, remains far less studied. Therefore, additional research into this problem is required, driven by the development of high-performance Russian-language and multilingual LLMs. We have conducted a systematic study of quantizing pretrained models to 2.0–4.25 bits per parameter for modern Russian-language LLMs at various scales, ranging from 4 to 32 billion parameters (4B and 32B). Our experimental setup covers both standard uniform quantization and specialized low-bit formats. Our findings highlight several key trends: (i) the tolerance of Russian-language LLMs to quantization varies across model architectures and sizes; (ii) 4-bit quantization demonstrates high robustness, particularly when advanced formats are employed; (iii) 3-bit and 2-bit quantizations prove to be the most sensitive to calibration data and scaling strategies. Empirical results show that the model’s domain must be considered when employing different quantization techniques.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 5","pages":"S437 - S446"},"PeriodicalIF":0.5000,"publicationDate":"2026-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.3103/S0005105525701389","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Quantization has become a key technique for the compression and acceleration of large language models (LLMs). Although research into low-bit quantization is actively advancing for English-language LLMs, its impact on morphologically rich and resource-diverse languages, including Russian, remains far less studied. Therefore, additional research into this problem is required, driven by the development of high-performance Russian-language and multilingual LLMs. We have conducted a systematic study of quantizing pretrained models to 2.0–4.25 bits per parameter for modern Russian-language LLMs at various scales, ranging from 4 to 32 billion parameters (4B and 32B). Our experimental setup covers both standard uniform quantization and specialized low-bit formats. Our findings highlight several key trends: (i) the tolerance of Russian-language LLMs to quantization varies across model architectures and sizes; (ii) 4-bit quantization demonstrates high robustness, particularly when advanced formats are employed; (iii) 3-bit and 2-bit quantizations prove to be the most sensitive to calibration data and scaling strategies. Empirical results show that the model’s domain must be considered when employing different quantization techniques.
期刊介绍:
Automatic Documentation and Mathematical Linguistics is an international peer reviewed journal that covers all aspects of automation of information processes and systems, as well as algorithms and methods for automatic language analysis. Emphasis is on the practical applications of new technologies and techniques for information analysis and processing.