Mihai-Valentin Dumitru, Vlad-Andrei Bădoiu, Alexandru M. Gherghescu, Costin Raiciu
{"title":"Generating P4 data planes using LLMs","authors":"Mihai-Valentin Dumitru, Vlad-Andrei Bădoiu, Alexandru M. Gherghescu, Costin Raiciu","doi":"10.1016/j.comnet.2025.111709","DOIUrl":null,"url":null,"abstract":"<div><div>Over the past few years, Large Language Models (LLMs) have become the source of impressive results in code generation. However, most research focuses on widely adopted general-purpose programming languages, with little attention given to niche domain-specific languages (DSLs). This raises the question: do DSLs, such as P4, a data plane programming language, have a place in the LLM world?</div><div>The potential impact of generating DSL code could be tremendous. Automatically generating data plane code promises flexible networks that can quickly adapt to specific conditions at the lowest level. P4 is structurally simpler than general-purpose languages, but also offers a much smaller corpus of existing programs, thus setting up interesting challenges for deep-learning based code generation.</div><div>In this paper, we show that crafting a highly specialized P4 dataset with domain knowledge is sufficient to bootstrap P4 code generation through fine-tuning existing LLMs, even when they have not encountered P4 code during pre-training. We further document the process of creating a relevant benchmark to assess the proficiency of fine-tuned models in generating P4 code. Our evaluation shows that our fine-tuned models outperform much larger models in both syntactic correctness and semantic alignment.</div></div>","PeriodicalId":50637,"journal":{"name":"Computer Networks","volume":"272 ","pages":"Article 111709"},"PeriodicalIF":4.6000,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1389128625006759","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Over the past few years, Large Language Models (LLMs) have become the source of impressive results in code generation. However, most research focuses on widely adopted general-purpose programming languages, with little attention given to niche domain-specific languages (DSLs). This raises the question: do DSLs, such as P4, a data plane programming language, have a place in the LLM world?
The potential impact of generating DSL code could be tremendous. Automatically generating data plane code promises flexible networks that can quickly adapt to specific conditions at the lowest level. P4 is structurally simpler than general-purpose languages, but also offers a much smaller corpus of existing programs, thus setting up interesting challenges for deep-learning based code generation.
In this paper, we show that crafting a highly specialized P4 dataset with domain knowledge is sufficient to bootstrap P4 code generation through fine-tuning existing LLMs, even when they have not encountered P4 code during pre-training. We further document the process of creating a relevant benchmark to assess the proficiency of fine-tuned models in generating P4 code. Our evaluation shows that our fine-tuned models outperform much larger models in both syntactic correctness and semantic alignment.
期刊介绍:
Computer Networks is an international, archival journal providing a publication vehicle for complete coverage of all topics of interest to those involved in the computer communications networking area. The audience includes researchers, managers and operators of networks as well as designers and implementors. The Editorial Board will consider any material for publication that is of interest to those groups.