使用llm生成P4数据平面

IF 4.6 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Computer Networks Pub Date : 2025-09-13 DOI:10.1016/j.comnet.2025.111709

Mihai-Valentin Dumitru, Vlad-Andrei Bădoiu, Alexandru M. Gherghescu, Costin Raiciu

{"title":"使用llm生成P4数据平面","authors":"Mihai-Valentin Dumitru, Vlad-Andrei Bădoiu, Alexandru M. Gherghescu, Costin Raiciu","doi":"10.1016/j.comnet.2025.111709","DOIUrl":null,"url":null,"abstract":"<div><div>Over the past few years, Large Language Models (LLMs) have become the source of impressive results in code generation. However, most research focuses on widely adopted general-purpose programming languages, with little attention given to niche domain-specific languages (DSLs). This raises the question: do DSLs, such as P4, a data plane programming language, have a place in the LLM world?</div><div>The potential impact of generating DSL code could be tremendous. Automatically generating data plane code promises flexible networks that can quickly adapt to specific conditions at the lowest level. P4 is structurally simpler than general-purpose languages, but also offers a much smaller corpus of existing programs, thus setting up interesting challenges for deep-learning based code generation.</div><div>In this paper, we show that crafting a highly specialized P4 dataset with domain knowledge is sufficient to bootstrap P4 code generation through fine-tuning existing LLMs, even when they have not encountered P4 code during pre-training. We further document the process of creating a relevant benchmark to assess the proficiency of fine-tuned models in generating P4 code. Our evaluation shows that our fine-tuned models outperform much larger models in both syntactic correctness and semantic alignment.</div></div>","PeriodicalId":50637,"journal":{"name":"Computer Networks","volume":"272 ","pages":"Article 111709"},"PeriodicalIF":4.6000,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Generating P4 data planes using LLMs\",\"authors\":\"Mihai-Valentin Dumitru, Vlad-Andrei Bădoiu, Alexandru M. Gherghescu, Costin Raiciu\",\"doi\":\"10.1016/j.comnet.2025.111709\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Over the past few years, Large Language Models (LLMs) have become the source of impressive results in code generation. However, most research focuses on widely adopted general-purpose programming languages, with little attention given to niche domain-specific languages (DSLs). This raises the question: do DSLs, such as P4, a data plane programming language, have a place in the LLM world?</div><div>The potential impact of generating DSL code could be tremendous. Automatically generating data plane code promises flexible networks that can quickly adapt to specific conditions at the lowest level. P4 is structurally simpler than general-purpose languages, but also offers a much smaller corpus of existing programs, thus setting up interesting challenges for deep-learning based code generation.</div><div>In this paper, we show that crafting a highly specialized P4 dataset with domain knowledge is sufficient to bootstrap P4 code generation through fine-tuning existing LLMs, even when they have not encountered P4 code during pre-training. We further document the process of creating a relevant benchmark to assess the proficiency of fine-tuned models in generating P4 code. Our evaluation shows that our fine-tuned models outperform much larger models in both syntactic correctness and semantic alignment.</div></div>\",\"PeriodicalId\":50637,\"journal\":{\"name\":\"Computer Networks\",\"volume\":\"272 \",\"pages\":\"Article 111709\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1389128625006759\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1389128625006759","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

在过去的几年中，大型语言模型（llm）已经成为代码生成中令人印象深刻的结果的来源。然而，大多数研究都集中在广泛采用的通用编程语言上，很少关注特定领域语言（dsl）。这就提出了一个问题：dsl，比如P4，一种数据平面编程语言，在法学硕士领域有一席之地吗？生成DSL代码的潜在影响可能是巨大的。自动生成数据平面代码保证了灵活的网络，可以快速适应最低级别的特定条件。P4在结构上比通用语言更简单，但也提供了更小的现有程序语料库，因此为基于深度学习的代码生成设置了有趣的挑战。在本文中，我们表明，制作一个具有领域知识的高度专业化的P4数据集足以通过微调现有的llm来引导P4代码生成，即使它们在预训练期间没有遇到P4代码。我们进一步记录了创建相关基准的过程，以评估在生成P4代码时微调模型的熟练程度。我们的评估表明，我们的微调模型在语法正确性和语义一致性方面都优于更大的模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Generating P4 data planes using LLMs

Over the past few years, Large Language Models (LLMs) have become the source of impressive results in code generation. However, most research focuses on widely adopted general-purpose programming languages, with little attention given to niche domain-specific languages (DSLs). This raises the question: do DSLs, such as P4, a data plane programming language, have a place in the LLM world?

The potential impact of generating DSL code could be tremendous. Automatically generating data plane code promises flexible networks that can quickly adapt to specific conditions at the lowest level. P4 is structurally simpler than general-purpose languages, but also offers a much smaller corpus of existing programs, thus setting up interesting challenges for deep-learning based code generation.

In this paper, we show that crafting a highly specialized P4 dataset with domain knowledge is sufficient to bootstrap P4 code generation through fine-tuning existing LLMs, even when they have not encountered P4 code during pre-training. We further document the process of creating a relevant benchmark to assess the proficiency of fine-tuned models in generating P4 code. Our evaluation shows that our fine-tuned models outperform much larger models in both syntactic correctness and semantic alignment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Networks 工程技术-电信学

CiteScore

10.80

自引率

3.60%

发文量

434

审稿时长

8.6 months

期刊介绍： Computer Networks is an international, archival journal providing a publication vehicle for complete coverage of all topics of interest to those involved in the computer communications networking area. The audience includes researchers, managers and operators of networks as well as designers and implementors. The Editorial Board will consider any material for publication that is of interest to those groups.