Interpretable Code Summarization

IF 5 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Reliability Pub Date : 2024-03-14 DOI:10.1109/TR.2024.3392876

Md Sarwar Kamal;Sonia Farhana Nimmy;Nilanjan Dey

{"title":"Interpretable Code Summarization","authors":"Md Sarwar Kamal;Sonia Farhana Nimmy;Nilanjan Dey","doi":"10.1109/TR.2024.3392876","DOIUrl":null,"url":null,"abstract":"Code summarization is a process of creating a readable natural language from programming source codes. Code summarization has become a popular research topic for software maintenance, code generation, and code recovery. Existing code summarization methods follow the encoding/decoding approach and use various machine learning techniques to generate natural language from source codes. Although most of these methods are state of the art, it is difficult to understand the complex encoding and decoding process to map the tokens with natural language words. Therefore, these coding and decoding approaches are treated as opaque models (black box). This research proposes explainable AI methods that overcome the black box features for the token mapping in code summarization process. Here, we created an abstract syntax tree (AST) from the tokens of the source code. We then embedded the AST into natural language words using a bilingual statistical probability approach to generate possible statistical parse trees. We applied a page rank algorithm among the parse trees to rank the trees. From the best-ranked tree, we generate the comment for the corresponding code snippet. To explain our code generation method, we used Takagi–Sugeno fuzzy approach, layerwise relevance propagation and a hidden Markov model. These approaches make our method trustworthy and understandable to humans to understand the process of source code token mapping with natural language words.","PeriodicalId":56305,"journal":{"name":"IEEE Transactions on Reliability","volume":"74 1","pages":"2280-2289"},"PeriodicalIF":5.0000,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Reliability","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10530504/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Code summarization is a process of creating a readable natural language from programming source codes. Code summarization has become a popular research topic for software maintenance, code generation, and code recovery. Existing code summarization methods follow the encoding/decoding approach and use various machine learning techniques to generate natural language from source codes. Although most of these methods are state of the art, it is difficult to understand the complex encoding and decoding process to map the tokens with natural language words. Therefore, these coding and decoding approaches are treated as opaque models (black box). This research proposes explainable AI methods that overcome the black box features for the token mapping in code summarization process. Here, we created an abstract syntax tree (AST) from the tokens of the source code. We then embedded the AST into natural language words using a bilingual statistical probability approach to generate possible statistical parse trees. We applied a page rank algorithm among the parse trees to rank the trees. From the best-ranked tree, we generate the comment for the corresponding code snippet. To explain our code generation method, we used Takagi–Sugeno fuzzy approach, layerwise relevance propagation and a hidden Markov model. These approaches make our method trustworthy and understandable to humans to understand the process of source code token mapping with natural language words.

查看原文本刊更多论文

可解释代码汇总

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Reliability 工程技术-工程：电子与电气

CiteScore

12.20

自引率

8.50%

发文量

153

审稿时长

7.5 months

期刊介绍： IEEE Transactions on Reliability is a refereed journal for the reliability and allied disciplines including, but not limited to, maintainability, physics of failure, life testing, prognostics, design and manufacture for reliability, reliability for systems of systems, network availability, mission success, warranty, safety, and various measures of effectiveness. Topics eligible for publication range from hardware to software, from materials to systems, from consumer and industrial devices to manufacturing plants, from individual items to networks, from techniques for making things better to ways of predicting and measuring behavior in the field. As an engineering subject that supports new and existing technologies, we constantly expand into new areas of the assurance sciences.