Tele-LLMs: A Series of Specialized Large Language Models for Telecommunications

arXiv - MATH - Information Theory Pub Date : 2024-09-09 DOI:arxiv-2409.05314

Ali Maatouk, Kenny Chirino Ampudia, Rex Ying, Leandros Tassiulas

{"title":"Tele-LLMs: A Series of Specialized Large Language Models for Telecommunications","authors":"Ali Maatouk, Kenny Chirino Ampudia, Rex Ying, Leandros Tassiulas","doi":"arxiv-2409.05314","DOIUrl":null,"url":null,"abstract":"The emergence of large language models (LLMs) has significantly impacted\nvarious fields, from natural language processing to sectors like medicine and\nfinance. However, despite their rapid proliferation, the applications of LLMs\nin telecommunications remain limited, often relying on general-purpose models\nthat lack domain-specific specialization. This lack of specialization results\nin underperformance, particularly when dealing with telecommunications-specific\ntechnical terminology and their associated mathematical representations. This\npaper addresses this gap by first creating and disseminating Tele-Data, a\ncomprehensive dataset of telecommunications material curated from relevant\nsources, and Tele-Eval, a large-scale question-and-answer dataset tailored to\nthe domain. Through extensive experiments, we explore the most effective\ntraining techniques for adapting LLMs to the telecommunications domain, ranging\nfrom examining the division of expertise across various telecommunications\naspects to employing parameter-efficient techniques. We also investigate how\nmodels of different sizes behave during adaptation and analyze the impact of\ntheir training data on this behavior. Leveraging these findings, we develop and\nopen-source Tele-LLMs, the first series of language models ranging from 1B to\n8B parameters, specifically tailored for telecommunications. Our evaluations\ndemonstrate that these models outperform their general-purpose counterparts on\nTele-Eval while retaining their previously acquired capabilities, thus avoiding\nthe catastrophic forgetting phenomenon.","PeriodicalId":501082,"journal":{"name":"arXiv - MATH - Information Theory","volume":"56 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05314","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The emergence of large language models (LLMs) has significantly impacted various fields, from natural language processing to sectors like medicine and finance. However, despite their rapid proliferation, the applications of LLMs in telecommunications remain limited, often relying on general-purpose models that lack domain-specific specialization. This lack of specialization results in underperformance, particularly when dealing with telecommunications-specific technical terminology and their associated mathematical representations. This paper addresses this gap by first creating and disseminating Tele-Data, a comprehensive dataset of telecommunications material curated from relevant sources, and Tele-Eval, a large-scale question-and-answer dataset tailored to the domain. Through extensive experiments, we explore the most effective training techniques for adapting LLMs to the telecommunications domain, ranging from examining the division of expertise across various telecommunications aspects to employing parameter-efficient techniques. We also investigate how models of different sizes behave during adaptation and analyze the impact of their training data on this behavior. Leveraging these findings, we develop and open-source Tele-LLMs, the first series of language models ranging from 1B to 8B parameters, specifically tailored for telecommunications. Our evaluations demonstrate that these models outperform their general-purpose counterparts on Tele-Eval while retaining their previously acquired capabilities, thus avoiding the catastrophic forgetting phenomenon.

查看原文本刊更多论文

Tele-LLMs：电信专用大型语言模型系列

大型语言模型（LLM）的出现极大地影响了从自然语言处理到医疗和金融等各个领域。然而，尽管大型语言模型迅速普及，但其在电信领域的应用仍然有限，往往依赖于缺乏特定领域专业化的通用模型。这种专业化的缺乏导致性能低下，尤其是在处理电信专用技术术语及其相关数学表示时。本文首先创建并发布了 Tele-Data（一个从相关资源中整理出来的电信资料综合数据集）和 Tele-Eval（一个为该领域量身定制的大型问答数据集），从而弥补了这一不足。通过广泛的实验，我们探索了最有效的训练技术，以将 LLMs 适应于电信领域，包括检查不同电信方面的专业知识分工，以及采用参数高效技术。我们还研究了不同规模的模型在适应过程中的行为，并分析了其训练数据对这种行为的影响。利用这些发现，我们开发并开源了 Tele-LLMs，这是首个专为电信定制的语言模型系列，参数范围从 1B 到 8B。我们的评估结果表明，这些模型在 Tele-Eval 上的表现优于通用模型，同时还保留了先前获得的能力，从而避免了灾难性遗忘现象。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - MATH - Information Theory

自引率

0.00%

发文量