Xai-driven knowledge distillation of large language models for efficient deployment on low-resource devices

IF 6.4 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Big Data Pub Date : 2024-05-04 DOI:10.1186/s40537-024-00928-3

Riccardo Cantini, Alessio Orsino, Domenico Talia

{"title":"Xai-driven knowledge distillation of large language models for efficient deployment on low-resource devices","authors":"Riccardo Cantini, Alessio Orsino, Domenico Talia","doi":"10.1186/s40537-024-00928-3","DOIUrl":null,"url":null,"abstract":"<p>Large Language Models (LLMs) are characterized by their inherent memory inefficiency and compute-intensive nature, making them impractical to run on low-resource devices and hindering their applicability in edge AI contexts. To address this issue, Knowledge Distillation approaches have been adopted to transfer knowledge from a complex model, referred to as the teacher, to a more compact, computationally efficient one, known as the student. The aim is to retain the performance of the original model while substantially reducing computational requirements. However, traditional knowledge distillation methods may struggle to effectively transfer crucial explainable knowledge from an LLM teacher to the student, potentially leading to explanation inconsistencies and decreased performance. This paper presents <i>DiXtill</i>, a method based on a novel approach to distilling knowledge from LLMs into lightweight neural architectures. The main idea is to leverage local explanations provided by an eXplainable Artificial Intelligence (XAI) method to guide the cross-architecture distillation of a teacher LLM into a self-explainable student, specifically a bi-directional LSTM network.Experimental results show that our XAI-driven distillation method allows the teacher explanations to be effectively transferred to the student, resulting in better agreement compared to classical distillation methods,thus enhancing the student interpretability. Furthermore, it enables the student to achieve comparable performance to the teacher LLM while also delivering a significantly higher compression ratio and speedup compared to other techniques such as post-training quantization and pruning, which paves the way for more efficient and sustainable edge AI applications</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"18 1","pages":""},"PeriodicalIF":6.4000,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Big Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s40537-024-00928-3","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Large Language Models (LLMs) are characterized by their inherent memory inefficiency and compute-intensive nature, making them impractical to run on low-resource devices and hindering their applicability in edge AI contexts. To address this issue, Knowledge Distillation approaches have been adopted to transfer knowledge from a complex model, referred to as the teacher, to a more compact, computationally efficient one, known as the student. The aim is to retain the performance of the original model while substantially reducing computational requirements. However, traditional knowledge distillation methods may struggle to effectively transfer crucial explainable knowledge from an LLM teacher to the student, potentially leading to explanation inconsistencies and decreased performance. This paper presents DiXtill, a method based on a novel approach to distilling knowledge from LLMs into lightweight neural architectures. The main idea is to leverage local explanations provided by an eXplainable Artificial Intelligence (XAI) method to guide the cross-architecture distillation of a teacher LLM into a self-explainable student, specifically a bi-directional LSTM network.Experimental results show that our XAI-driven distillation method allows the teacher explanations to be effectively transferred to the student, resulting in better agreement compared to classical distillation methods,thus enhancing the student interpretability. Furthermore, it enables the student to achieve comparable performance to the teacher LLM while also delivering a significantly higher compression ratio and speedup compared to other techniques such as post-training quantization and pruning, which paves the way for more efficient and sustainable edge AI applications

Abstract Image

查看原文本刊更多论文

对大型语言模型进行 Xai 驱动的知识提炼，以便在低资源设备上高效部署

大型语言模型（LLM）的固有特点是内存效率低和计算密集型，这使其无法在低资源设备上运行，并阻碍了其在边缘人工智能环境中的应用。为了解决这个问题，人们采用了知识蒸馏方法，将知识从复杂的模型（称为教师）转移到更紧凑、计算效率更高的模型（称为学生）。这样做的目的是保留原始模型的性能，同时大幅降低计算要求。然而，传统的知识提炼方法可能难以有效地将关键的可解释知识从 LLM 教师转移到学生，从而可能导致解释不一致和性能下降。本文介绍的 DiXtill 是一种基于新方法的方法，可将 LLM 中的知识提炼到轻量级神经架构中。实验结果表明，我们的 XAI 驱动的提炼方法能将教师的解释有效地传递给学生，与传统的提炼方法相比，能产生更好的一致性，从而提高学生的可解释性。此外，与其他技术（如训练后量化和剪枝）相比，它还能使学生获得与教师 LLM 相当的性能，同时提供更高的压缩率和速度，从而为更高效、更可持续的边缘人工智能应用铺平道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Big Data Computer Science-Information Systems

CiteScore

17.80

自引率

3.70%

发文量

105

审稿时长

13 weeks

期刊介绍： The Journal of Big Data publishes high-quality, scholarly research papers, methodologies, and case studies covering a broad spectrum of topics, from big data analytics to data-intensive computing and all applications of big data research. It addresses challenges facing big data today and in the future, including data capture and storage, search, sharing, analytics, technologies, visualization, architectures, data mining, machine learning, cloud computing, distributed systems, and scalable storage. The journal serves as a seminal source of innovative material for academic researchers and practitioners alike.