Ali Maatouk, Kenny Chirino Ampudia, Rex Ying, Leandros Tassiulas
{"title":"Tele-LLMs: A Series of Specialized Large Language Models for Telecommunications","authors":"Ali Maatouk, Kenny Chirino Ampudia, Rex Ying, Leandros Tassiulas","doi":"arxiv-2409.05314","DOIUrl":null,"url":null,"abstract":"The emergence of large language models (LLMs) has significantly impacted\nvarious fields, from natural language processing to sectors like medicine and\nfinance. However, despite their rapid proliferation, the applications of LLMs\nin telecommunications remain limited, often relying on general-purpose models\nthat lack domain-specific specialization. This lack of specialization results\nin underperformance, particularly when dealing with telecommunications-specific\ntechnical terminology and their associated mathematical representations. This\npaper addresses this gap by first creating and disseminating Tele-Data, a\ncomprehensive dataset of telecommunications material curated from relevant\nsources, and Tele-Eval, a large-scale question-and-answer dataset tailored to\nthe domain. Through extensive experiments, we explore the most effective\ntraining techniques for adapting LLMs to the telecommunications domain, ranging\nfrom examining the division of expertise across various telecommunications\naspects to employing parameter-efficient techniques. We also investigate how\nmodels of different sizes behave during adaptation and analyze the impact of\ntheir training data on this behavior. Leveraging these findings, we develop and\nopen-source Tele-LLMs, the first series of language models ranging from 1B to\n8B parameters, specifically tailored for telecommunications. Our evaluations\ndemonstrate that these models outperform their general-purpose counterparts on\nTele-Eval while retaining their previously acquired capabilities, thus avoiding\nthe catastrophic forgetting phenomenon.","PeriodicalId":501082,"journal":{"name":"arXiv - MATH - Information Theory","volume":"56 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05314","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The emergence of large language models (LLMs) has significantly impacted
various fields, from natural language processing to sectors like medicine and
finance. However, despite their rapid proliferation, the applications of LLMs
in telecommunications remain limited, often relying on general-purpose models
that lack domain-specific specialization. This lack of specialization results
in underperformance, particularly when dealing with telecommunications-specific
technical terminology and their associated mathematical representations. This
paper addresses this gap by first creating and disseminating Tele-Data, a
comprehensive dataset of telecommunications material curated from relevant
sources, and Tele-Eval, a large-scale question-and-answer dataset tailored to
the domain. Through extensive experiments, we explore the most effective
training techniques for adapting LLMs to the telecommunications domain, ranging
from examining the division of expertise across various telecommunications
aspects to employing parameter-efficient techniques. We also investigate how
models of different sizes behave during adaptation and analyze the impact of
their training data on this behavior. Leveraging these findings, we develop and
open-source Tele-LLMs, the first series of language models ranging from 1B to
8B parameters, specifically tailored for telecommunications. Our evaluations
demonstrate that these models outperform their general-purpose counterparts on
Tele-Eval while retaining their previously acquired capabilities, thus avoiding
the catastrophic forgetting phenomenon.