{"title":"基于知识蒸馏和低秩分解的轻量级预训练韩语模型。","authors":"Jin-Hwan Kim, Young-Seok Choi","doi":"10.3390/e27040379","DOIUrl":null,"url":null,"abstract":"<p><p>Natural Language Processing (NLP) stands as a forefront of artificial intelligence research, empowering computational systems to comprehend and process human language as used in everyday contexts. Language models (LMs) underpin this field, striving to capture the intricacies of linguistic structure and semantics by assigning probabilities to sequences of words. The trend towards large language models (LLMs) has shown significant performance improvements with increasing model size. However, the deployment of LLMs on resource-limited devices such as mobile and edge devices remains a challenge. This issue is particularly pronounced in languages other than English, including Korean, where pre-trained models are relatively scarce. Addressing this gap, we introduce a novel lightweight pre-trained Korean language model that leverages knowledge distillation and low-rank factorization techniques. Our approach distills knowledge from a 432 MB (approximately 110 M parameters) teacher model into student models of substantially reduced sizes (e.g., 53 MB ≈ 14 M parameters, 35 MB ≈ 13 M parameters, 30 MB ≈ 11 M parameters, and 18 MB ≈ 4 M parameters). The smaller student models further employ low-rank factorization to minimize the parameter count within the Transformer's feed-forward network (FFN) and embedding layer. We evaluate the efficacy of our lightweight model across six established Korean NLP tasks. Notably, our most compact model, KR-ELECTRA-Small-KD, attains over 97.387% of the teacher model's performance despite an 8.15× reduction in size. Remarkably, on the NSMC sentiment classification benchmark, KR-ELECTRA-Small-KD surpasses the teacher model with an accuracy of 89.720%. These findings underscore the potential of our model as an efficient solution for NLP applications in resource-constrained settings.</p>","PeriodicalId":11694,"journal":{"name":"Entropy","volume":"27 4","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12026428/pdf/","citationCount":"0","resultStr":"{\"title\":\"Lightweight Pre-Trained Korean Language Model Based on Knowledge Distillation and Low-Rank Factorization.\",\"authors\":\"Jin-Hwan Kim, Young-Seok Choi\",\"doi\":\"10.3390/e27040379\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Natural Language Processing (NLP) stands as a forefront of artificial intelligence research, empowering computational systems to comprehend and process human language as used in everyday contexts. Language models (LMs) underpin this field, striving to capture the intricacies of linguistic structure and semantics by assigning probabilities to sequences of words. The trend towards large language models (LLMs) has shown significant performance improvements with increasing model size. However, the deployment of LLMs on resource-limited devices such as mobile and edge devices remains a challenge. This issue is particularly pronounced in languages other than English, including Korean, where pre-trained models are relatively scarce. Addressing this gap, we introduce a novel lightweight pre-trained Korean language model that leverages knowledge distillation and low-rank factorization techniques. Our approach distills knowledge from a 432 MB (approximately 110 M parameters) teacher model into student models of substantially reduced sizes (e.g., 53 MB ≈ 14 M parameters, 35 MB ≈ 13 M parameters, 30 MB ≈ 11 M parameters, and 18 MB ≈ 4 M parameters). The smaller student models further employ low-rank factorization to minimize the parameter count within the Transformer's feed-forward network (FFN) and embedding layer. We evaluate the efficacy of our lightweight model across six established Korean NLP tasks. Notably, our most compact model, KR-ELECTRA-Small-KD, attains over 97.387% of the teacher model's performance despite an 8.15× reduction in size. Remarkably, on the NSMC sentiment classification benchmark, KR-ELECTRA-Small-KD surpasses the teacher model with an accuracy of 89.720%. These findings underscore the potential of our model as an efficient solution for NLP applications in resource-constrained settings.</p>\",\"PeriodicalId\":11694,\"journal\":{\"name\":\"Entropy\",\"volume\":\"27 4\",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-04-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12026428/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Entropy\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.3390/e27040379\",\"RegionNum\":3,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PHYSICS, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e27040379","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
Lightweight Pre-Trained Korean Language Model Based on Knowledge Distillation and Low-Rank Factorization.
Natural Language Processing (NLP) stands as a forefront of artificial intelligence research, empowering computational systems to comprehend and process human language as used in everyday contexts. Language models (LMs) underpin this field, striving to capture the intricacies of linguistic structure and semantics by assigning probabilities to sequences of words. The trend towards large language models (LLMs) has shown significant performance improvements with increasing model size. However, the deployment of LLMs on resource-limited devices such as mobile and edge devices remains a challenge. This issue is particularly pronounced in languages other than English, including Korean, where pre-trained models are relatively scarce. Addressing this gap, we introduce a novel lightweight pre-trained Korean language model that leverages knowledge distillation and low-rank factorization techniques. Our approach distills knowledge from a 432 MB (approximately 110 M parameters) teacher model into student models of substantially reduced sizes (e.g., 53 MB ≈ 14 M parameters, 35 MB ≈ 13 M parameters, 30 MB ≈ 11 M parameters, and 18 MB ≈ 4 M parameters). The smaller student models further employ low-rank factorization to minimize the parameter count within the Transformer's feed-forward network (FFN) and embedding layer. We evaluate the efficacy of our lightweight model across six established Korean NLP tasks. Notably, our most compact model, KR-ELECTRA-Small-KD, attains over 97.387% of the teacher model's performance despite an 8.15× reduction in size. Remarkably, on the NSMC sentiment classification benchmark, KR-ELECTRA-Small-KD surpasses the teacher model with an accuracy of 89.720%. These findings underscore the potential of our model as an efficient solution for NLP applications in resource-constrained settings.
期刊介绍:
Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.