{"title":"Bash command comment generation via multi-scale heterogeneous feature fusion","authors":"Junsan Zhang, Yang Zhu, Ao Lu, Yudie Yan, Yao Wan","doi":"10.1007/s10515-025-00494-9","DOIUrl":null,"url":null,"abstract":"<div><p>Automatic generation of Bash command comments is crucial for understanding and updating commands in software maintenance. Existing mainstream methods mainly focus on learning from the sequential text of Bash commands and combining retrieval-enhanced techniques to generate comments. However, these methods overlook the syntactic structure of Bash commands, thereby limiting the quality and accuracy of generated comments. This paper proposes a heterogeneous Bash comment generation framework named HBCom, which is aimed at deeply exploring the semantic information of Bash commands from command token sequences and syntactic structures to generate more accurate and natural command comments. The core of HBCom lies in constructing a Heterogeneous Information Graph (HIG) based on an Abstract Syntax Tree, which integrates the syntactic structure of Bash commands with the code sequence through six types of edges, providing a solid information basis for subsequent comment generation. In addition, we propose a heterogeneous and multi-scale graph neural network to capture various relationships in HIGs. Subsequently, we utilize a Transformer decoder, combined with a copy mechanism based on multi-head attention, to decode and fuse the HIG and Bash command tokens features, ultimately generating high-quality comments. We conduct extensive experiments on Bash dataset, demonstrating that HBCom outperforms compared baseline models in BLEU, ROUGE-L, and METEOR metrics. Furthermore, human evaluations confirm HBCom’s effectiveness in real-world application scenarios.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-025-00494-9","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Automatic generation of Bash command comments is crucial for understanding and updating commands in software maintenance. Existing mainstream methods mainly focus on learning from the sequential text of Bash commands and combining retrieval-enhanced techniques to generate comments. However, these methods overlook the syntactic structure of Bash commands, thereby limiting the quality and accuracy of generated comments. This paper proposes a heterogeneous Bash comment generation framework named HBCom, which is aimed at deeply exploring the semantic information of Bash commands from command token sequences and syntactic structures to generate more accurate and natural command comments. The core of HBCom lies in constructing a Heterogeneous Information Graph (HIG) based on an Abstract Syntax Tree, which integrates the syntactic structure of Bash commands with the code sequence through six types of edges, providing a solid information basis for subsequent comment generation. In addition, we propose a heterogeneous and multi-scale graph neural network to capture various relationships in HIGs. Subsequently, we utilize a Transformer decoder, combined with a copy mechanism based on multi-head attention, to decode and fuse the HIG and Bash command tokens features, ultimately generating high-quality comments. We conduct extensive experiments on Bash dataset, demonstrating that HBCom outperforms compared baseline models in BLEU, ROUGE-L, and METEOR metrics. Furthermore, human evaluations confirm HBCom’s effectiveness in real-world application scenarios.
期刊介绍:
This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes.
Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.