{"title":"代码注释生成的替代方案?从字节码生成注释","authors":"Xiangping Chen , Junqi Chen , Zhilu Lian , Yuan Huang , Xiaocong Zhou , Yunzhi Wu , Zibin Zheng","doi":"10.1016/j.infsof.2024.107623","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>Due to the importance and necessity of code comments, recent works propose many comment generation models with source code as input. But sometimes there has no access to obtain the source code, only the bytecode, such as many Apps.</div></div><div><h3>Objective:</h3><div>If there is a way to generate comments for bytecode directly, tasks such as malware detection and understanding closed-source software can benefit from the generated comment because it improves the understandability of the system. Therefore, we propose a novel approach called ByteGen to generate comments from bytecode.</div></div><div><h3>Methods:</h3><div>Specifically, to extract the structure characteristic of the bytecode, we utilize the control flow graph (CFG) of the bytecode and use a special traversal named enhanced SBT to serialize CFG. The enhanced SBT can completely preserve the structure of the CFG in a sequence. We set up experiments on a dataset with a scale of about 50,000 bytecode-comment pairs collected from Maven.</div></div><div><h3>Results:</h3><div>Experimental results show that the average BLEU-4 score of ByteGen is 28.67, which outperforms several baselines, and a human study also indicates the effectiveness of ByteGen in generating comments from bytecodes.</div></div><div><h3>Conclusion:</h3><div>In general, ByteGen performs better than other baselines. Therefore, this also proves the effectiveness of our approach in the code comment generation scenario without source code.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"179 ","pages":"Article 107623"},"PeriodicalIF":3.8000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An alternative to code comment generation? Generating comment from bytecode\",\"authors\":\"Xiangping Chen , Junqi Chen , Zhilu Lian , Yuan Huang , Xiaocong Zhou , Yunzhi Wu , Zibin Zheng\",\"doi\":\"10.1016/j.infsof.2024.107623\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Context:</h3><div>Due to the importance and necessity of code comments, recent works propose many comment generation models with source code as input. But sometimes there has no access to obtain the source code, only the bytecode, such as many Apps.</div></div><div><h3>Objective:</h3><div>If there is a way to generate comments for bytecode directly, tasks such as malware detection and understanding closed-source software can benefit from the generated comment because it improves the understandability of the system. Therefore, we propose a novel approach called ByteGen to generate comments from bytecode.</div></div><div><h3>Methods:</h3><div>Specifically, to extract the structure characteristic of the bytecode, we utilize the control flow graph (CFG) of the bytecode and use a special traversal named enhanced SBT to serialize CFG. The enhanced SBT can completely preserve the structure of the CFG in a sequence. We set up experiments on a dataset with a scale of about 50,000 bytecode-comment pairs collected from Maven.</div></div><div><h3>Results:</h3><div>Experimental results show that the average BLEU-4 score of ByteGen is 28.67, which outperforms several baselines, and a human study also indicates the effectiveness of ByteGen in generating comments from bytecodes.</div></div><div><h3>Conclusion:</h3><div>In general, ByteGen performs better than other baselines. Therefore, this also proves the effectiveness of our approach in the code comment generation scenario without source code.</div></div>\",\"PeriodicalId\":54983,\"journal\":{\"name\":\"Information and Software Technology\",\"volume\":\"179 \",\"pages\":\"Article 107623\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2024-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information and Software Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950584924002283\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584924002283","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
An alternative to code comment generation? Generating comment from bytecode
Context:
Due to the importance and necessity of code comments, recent works propose many comment generation models with source code as input. But sometimes there has no access to obtain the source code, only the bytecode, such as many Apps.
Objective:
If there is a way to generate comments for bytecode directly, tasks such as malware detection and understanding closed-source software can benefit from the generated comment because it improves the understandability of the system. Therefore, we propose a novel approach called ByteGen to generate comments from bytecode.
Methods:
Specifically, to extract the structure characteristic of the bytecode, we utilize the control flow graph (CFG) of the bytecode and use a special traversal named enhanced SBT to serialize CFG. The enhanced SBT can completely preserve the structure of the CFG in a sequence. We set up experiments on a dataset with a scale of about 50,000 bytecode-comment pairs collected from Maven.
Results:
Experimental results show that the average BLEU-4 score of ByteGen is 28.67, which outperforms several baselines, and a human study also indicates the effectiveness of ByteGen in generating comments from bytecodes.
Conclusion:
In general, ByteGen performs better than other baselines. Therefore, this also proves the effectiveness of our approach in the code comment generation scenario without source code.
期刊介绍:
Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include:
• Software management, quality and metrics,
• Software processes,
• Software architecture, modelling, specification, design and programming
• Functional and non-functional software requirements
• Software testing and verification & validation
• Empirical studies of all aspects of engineering and managing software development
Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information.
The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.