UNA: Improving Automated PL-NL System by A Unified Neural Architecture

IF 5.7 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Dawei Yuan;Tao Zhang;He Jiang
{"title":"UNA: Improving Automated PL-NL System by A Unified Neural Architecture","authors":"Dawei Yuan;Tao Zhang;He Jiang","doi":"10.1109/TR.2025.3541087","DOIUrl":null,"url":null,"abstract":"With the extensive application of artificial intelligence (AI) technologies, automated programming language-natural language (PL-NL) systems have gained significant attention, driving a series of related tasks served for developers and users, such as code search and summarization. Currently, mainstream PL-NL systems regard PL-NL as bimodal data and utilize two individual neural architectures (e.g., recurrent neural network) to learn the representation of PL-NL and build their semantic relations, improving the effects of these tasks. However, there exist two issues that limit the ability of these service systems in representation learning: first, large vocabularies cause data sparsity problems and limit the learning ability of neural architectures; second, there is not always a one-to-one correspondence between source code and natural language. To address these two issues, in this article, we introduce the unified neural architecture (UNA) by building a unified vocabulary (Uni-Vocab) at the subword level, to provide high-quality PL-NL services. In the Uni-Vocab, we build a unified modal encoding for PL-NL, which allows us to effectively control the vocabulary size and solve the data sparsity problem. Afterward, our built UNA can learn the unified contextual representation of PL-NL, which helps build their unified semantic relations. To validate the effectiveness of the proposed UNA, we perform experiments on code search and code summarization, which are two PL-NL tasks for developers and users. Experimental results demonstrate UNA can obtain noteworthy performance improvement. In detail, the baseline approaches in these two tasks get improvements by up to 36.09% and 18.02% in terms of mean reciprocal rank and bilingual evaluation understudy, respectively.","PeriodicalId":56305,"journal":{"name":"IEEE Transactions on Reliability","volume":"74 3","pages":"3585-3599"},"PeriodicalIF":5.7000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Reliability","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10909995/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

With the extensive application of artificial intelligence (AI) technologies, automated programming language-natural language (PL-NL) systems have gained significant attention, driving a series of related tasks served for developers and users, such as code search and summarization. Currently, mainstream PL-NL systems regard PL-NL as bimodal data and utilize two individual neural architectures (e.g., recurrent neural network) to learn the representation of PL-NL and build their semantic relations, improving the effects of these tasks. However, there exist two issues that limit the ability of these service systems in representation learning: first, large vocabularies cause data sparsity problems and limit the learning ability of neural architectures; second, there is not always a one-to-one correspondence between source code and natural language. To address these two issues, in this article, we introduce the unified neural architecture (UNA) by building a unified vocabulary (Uni-Vocab) at the subword level, to provide high-quality PL-NL services. In the Uni-Vocab, we build a unified modal encoding for PL-NL, which allows us to effectively control the vocabulary size and solve the data sparsity problem. Afterward, our built UNA can learn the unified contextual representation of PL-NL, which helps build their unified semantic relations. To validate the effectiveness of the proposed UNA, we perform experiments on code search and code summarization, which are two PL-NL tasks for developers and users. Experimental results demonstrate UNA can obtain noteworthy performance improvement. In detail, the baseline approaches in these two tasks get improvements by up to 36.09% and 18.02% in terms of mean reciprocal rank and bilingual evaluation understudy, respectively.
用统一的神经结构改进自动化PL-NL系统
随着人工智能(AI)技术的广泛应用,自动编程语言-自然语言(PL-NL)系统得到了广泛的关注,推动了一系列为开发人员和用户服务的相关任务,如代码搜索和摘要。目前,主流的PL-NL系统将PL-NL视为双峰数据,利用两种独立的神经架构(如递归神经网络)来学习PL-NL的表征并构建其语义关系,从而提高了这些任务的效果。然而,存在两个问题限制了这些服务系统在表示学习方面的能力:一是词汇量大导致数据稀疏性问题,限制了神经结构的学习能力;其次,源代码和自然语言之间并不总是一对一的对应关系。为了解决这两个问题,本文通过在子词层面构建统一词汇表(Uni-Vocab),引入统一神经架构(UNA),以提供高质量的PL-NL服务。在Uni-Vocab中,我们为PL-NL构建了统一的模态编码,使我们能够有效地控制词汇量大小,解决数据稀疏性问题。然后,我们构建的UNA可以学习PL-NL的统一上下文表示,这有助于建立它们统一的语义关系。为了验证所提出的UNA的有效性,我们对代码搜索和代码摘要进行了实验,这是开发人员和用户的两个PL-NL任务。实验结果表明,UNA可以获得显著的性能改进。具体而言,这两个任务的基线方法在平均对等排名和双语评估方面分别提高了36.09%和18.02%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Reliability
IEEE Transactions on Reliability 工程技术-工程:电子与电气
CiteScore
12.20
自引率
8.50%
发文量
153
审稿时长
7.5 months
期刊介绍: IEEE Transactions on Reliability is a refereed journal for the reliability and allied disciplines including, but not limited to, maintainability, physics of failure, life testing, prognostics, design and manufacture for reliability, reliability for systems of systems, network availability, mission success, warranty, safety, and various measures of effectiveness. Topics eligible for publication range from hardware to software, from materials to systems, from consumer and industrial devices to manufacturing plants, from individual items to networks, from techniques for making things better to ways of predicting and measuring behavior in the field. As an engineering subject that supports new and existing technologies, we constantly expand into new areas of the assurance sciences.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信