{"title":"UNA: Improving Automated PL-NL System by A Unified Neural Architecture","authors":"Dawei Yuan;Tao Zhang;He Jiang","doi":"10.1109/TR.2025.3541087","DOIUrl":null,"url":null,"abstract":"With the extensive application of artificial intelligence (AI) technologies, automated programming language-natural language (PL-NL) systems have gained significant attention, driving a series of related tasks served for developers and users, such as code search and summarization. Currently, mainstream PL-NL systems regard PL-NL as bimodal data and utilize two individual neural architectures (e.g., recurrent neural network) to learn the representation of PL-NL and build their semantic relations, improving the effects of these tasks. However, there exist two issues that limit the ability of these service systems in representation learning: first, large vocabularies cause data sparsity problems and limit the learning ability of neural architectures; second, there is not always a one-to-one correspondence between source code and natural language. To address these two issues, in this article, we introduce the unified neural architecture (UNA) by building a unified vocabulary (Uni-Vocab) at the subword level, to provide high-quality PL-NL services. In the Uni-Vocab, we build a unified modal encoding for PL-NL, which allows us to effectively control the vocabulary size and solve the data sparsity problem. Afterward, our built UNA can learn the unified contextual representation of PL-NL, which helps build their unified semantic relations. To validate the effectiveness of the proposed UNA, we perform experiments on code search and code summarization, which are two PL-NL tasks for developers and users. Experimental results demonstrate UNA can obtain noteworthy performance improvement. In detail, the baseline approaches in these two tasks get improvements by up to 36.09% and 18.02% in terms of mean reciprocal rank and bilingual evaluation understudy, respectively.","PeriodicalId":56305,"journal":{"name":"IEEE Transactions on Reliability","volume":"74 3","pages":"3585-3599"},"PeriodicalIF":5.7000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Reliability","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10909995/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
With the extensive application of artificial intelligence (AI) technologies, automated programming language-natural language (PL-NL) systems have gained significant attention, driving a series of related tasks served for developers and users, such as code search and summarization. Currently, mainstream PL-NL systems regard PL-NL as bimodal data and utilize two individual neural architectures (e.g., recurrent neural network) to learn the representation of PL-NL and build their semantic relations, improving the effects of these tasks. However, there exist two issues that limit the ability of these service systems in representation learning: first, large vocabularies cause data sparsity problems and limit the learning ability of neural architectures; second, there is not always a one-to-one correspondence between source code and natural language. To address these two issues, in this article, we introduce the unified neural architecture (UNA) by building a unified vocabulary (Uni-Vocab) at the subword level, to provide high-quality PL-NL services. In the Uni-Vocab, we build a unified modal encoding for PL-NL, which allows us to effectively control the vocabulary size and solve the data sparsity problem. Afterward, our built UNA can learn the unified contextual representation of PL-NL, which helps build their unified semantic relations. To validate the effectiveness of the proposed UNA, we perform experiments on code search and code summarization, which are two PL-NL tasks for developers and users. Experimental results demonstrate UNA can obtain noteworthy performance improvement. In detail, the baseline approaches in these two tasks get improvements by up to 36.09% and 18.02% in terms of mean reciprocal rank and bilingual evaluation understudy, respectively.
期刊介绍:
IEEE Transactions on Reliability is a refereed journal for the reliability and allied disciplines including, but not limited to, maintainability, physics of failure, life testing, prognostics, design and manufacture for reliability, reliability for systems of systems, network availability, mission success, warranty, safety, and various measures of effectiveness. Topics eligible for publication range from hardware to software, from materials to systems, from consumer and industrial devices to manufacturing plants, from individual items to networks, from techniques for making things better to ways of predicting and measuring behavior in the field. As an engineering subject that supports new and existing technologies, we constantly expand into new areas of the assurance sciences.