Haibo Lin, Zhong Li, Ruihua Ji, Minxue Pan, Tian Zhang, Nan Wu, Xuandong Li
{"title":"Decomposition then watermarking: Enhancing code traceability with dual-channel code watermarking","authors":"Haibo Lin, Zhong Li, Ruihua Ji, Minxue Pan, Tian Zhang, Nan Wu, Xuandong Li","doi":"10.1007/s10515-025-00561-1","DOIUrl":null,"url":null,"abstract":"<div><p>Code watermarking has gained increasing attention for tracing the provenance of code with the rapid growth of the open-source community. Existing work on code watermarking has shown promising results yet still falls short, especially when a multi-bit watermark for encoding diverse information is required. In this paper, we propose <span>DWC</span>, a novel code watermarking method with highly watermark capacity. The key idea of <span>DWC</span> is to first decompose the code into natural and formal channels, then embed the watermark separately into each channel based solely on its respective information. As such, <span>DWC</span> reduces the mutual interference between these two channels and the impacts of irrelevant information within the code, thus enabling more effective transformations for embedding watermarks with higher capacity and robustness. Our extensive experiments on source code snippets in four programming languages (C, C++, Java, and Python) demonstrate the effectiveness, efficiency, and capability of <span>DWC</span> in embedding multi-bit watermarks, as well as the utility and robustness of the watermarked code it generates.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-025-00561-1","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Code watermarking has gained increasing attention for tracing the provenance of code with the rapid growth of the open-source community. Existing work on code watermarking has shown promising results yet still falls short, especially when a multi-bit watermark for encoding diverse information is required. In this paper, we propose DWC, a novel code watermarking method with highly watermark capacity. The key idea of DWC is to first decompose the code into natural and formal channels, then embed the watermark separately into each channel based solely on its respective information. As such, DWC reduces the mutual interference between these two channels and the impacts of irrelevant information within the code, thus enabling more effective transformations for embedding watermarks with higher capacity and robustness. Our extensive experiments on source code snippets in four programming languages (C, C++, Java, and Python) demonstrate the effectiveness, efficiency, and capability of DWC in embedding multi-bit watermarks, as well as the utility and robustness of the watermarked code it generates.
期刊介绍:
This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes.
Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.