Beyond syntax: enhancing automated documentation with data differences

IF 3.1 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering Pub Date : 2026-05-04 DOI:10.1007/s10515-026-00623-y

Giacomo Fantino, Antonio Vetro’, Marco Torchiano, Federica Cappelluti

{"title":"Beyond syntax: enhancing automated documentation with data differences","authors":"Giacomo Fantino, Antonio Vetro’, Marco Torchiano, Federica Cappelluti","doi":"10.1007/s10515-026-00623-y","DOIUrl":null,"url":null,"abstract":"<div><p>Modern software development automation is mostly based on AI, covering every aspect of code production and maintenance, throughout the entire software development lifecycle, from requirements and code writing to testing and maintenance. Code commenting is no exception. Automated code comment generation methods rely on static syntactic and lexical features of source code. However, these approaches frequently underperform in data-centric software applications, where understanding the effect of code on data is essential. We explore an execution-aware extension to automatic documentation generation. In this exploratory work, we aim at capturing post-execution data transformations (i.e., <i>semantic data differences)</i> that reveal the code’s effect on data, and use it as a complementary signal alongside existing code representations to automate explanatory comments for data wrangling code. We build a curated dataset of Python notebooks from Kaggle and apply a lightweight execution tracer to extract structured descriptions of runtime data transformations. We define a formal grammar for capturing these effects and integrate them into a multimodal encoder-decoder model using co-attention mechanisms. Multiple training strategies are explored to assess the impact of this new modality on comment generation. Our evaluation reveals that models incorporating this modality performed competitively with code-only baselines. Notably, in cases where no observable data transformation occurred, the presence of symbolic <span>\\(\\langle \\mathsf {no\\_diff} \\rangle\\)</span> signals led to improved robustness and higher comment quality, as measured by both automatic and human evaluation metrics. However, we did not observe improvements in comment quality in semantically rich scenarios, suggesting possible paths of improvement for future research direction. Qualitative analysis of generated comments supports this pattern, indicating that the modality helps stabilize comments by reducing unnecessary or speculative details in neutral cases, but does not provide yet consistent guidance when meaningful data transformations occur. These trends are less pronounced on a larger, noisier extended test set, suggesting sensitivity to comment–code alignment. Our study demonstrates the feasibility and potential of using execution-derived feedback as a complementary signal in automated comment generation. While the current approach is limited by dataset size and modality noise, it demonstrates that post-execution state changes can guide more context-aware and stable code summarization. This suggests a promising direction for execution-sensitive models in assisting data-centric software development and its documentation.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 3","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2026-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-026-00623-y.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-026-00623-y","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Modern software development automation is mostly based on AI, covering every aspect of code production and maintenance, throughout the entire software development lifecycle, from requirements and code writing to testing and maintenance. Code commenting is no exception. Automated code comment generation methods rely on static syntactic and lexical features of source code. However, these approaches frequently underperform in data-centric software applications, where understanding the effect of code on data is essential. We explore an execution-aware extension to automatic documentation generation. In this exploratory work, we aim at capturing post-execution data transformations (i.e., semantic data differences) that reveal the code’s effect on data, and use it as a complementary signal alongside existing code representations to automate explanatory comments for data wrangling code. We build a curated dataset of Python notebooks from Kaggle and apply a lightweight execution tracer to extract structured descriptions of runtime data transformations. We define a formal grammar for capturing these effects and integrate them into a multimodal encoder-decoder model using co-attention mechanisms. Multiple training strategies are explored to assess the impact of this new modality on comment generation. Our evaluation reveals that models incorporating this modality performed competitively with code-only baselines. Notably, in cases where no observable data transformation occurred, the presence of symbolic \(\langle \mathsf {no\_diff} \rangle\) signals led to improved robustness and higher comment quality, as measured by both automatic and human evaluation metrics. However, we did not observe improvements in comment quality in semantically rich scenarios, suggesting possible paths of improvement for future research direction. Qualitative analysis of generated comments supports this pattern, indicating that the modality helps stabilize comments by reducing unnecessary or speculative details in neutral cases, but does not provide yet consistent guidance when meaningful data transformations occur. These trends are less pronounced on a larger, noisier extended test set, suggesting sensitivity to comment–code alignment. Our study demonstrates the feasibility and potential of using execution-derived feedback as a complementary signal in automated comment generation. While the current approach is limited by dataset size and modality noise, it demonstrates that post-execution state changes can guide more context-aware and stable code summarization. This suggests a promising direction for execution-sensitive models in assisting data-centric software development and its documentation.

Abstract Image

查看原文本刊更多论文

超越语法：增强具有数据差异的自动化文档

现代软件开发自动化主要基于人工智能，涵盖了代码生产和维护的各个方面，贯穿整个软件开发生命周期，从需求和代码编写到测试和维护。代码注释也不例外。自动代码注释生成方法依赖于源代码的静态语法和词法特性。然而，这些方法在以数据为中心的软件应用程序中经常表现不佳，在这些应用程序中，理解代码对数据的影响是必不可少的。我们将探索对自动文档生成的执行感知扩展。在这项探索性工作中，我们的目标是捕获执行后的数据转换（即语义数据差异），这些数据转换揭示了代码对数据的影响，并将其用作现有代码表示的补充信号，以自动解释数据争用代码的注释。我们从Kaggle构建了一个精心策划的Python笔记本数据集，并应用了一个轻量级的执行跟踪器来提取运行时数据转换的结构化描述。我们定义了一种形式语法来捕获这些效果，并使用共同注意机制将它们集成到多模态编码器-解码器模型中。我们探索了多种训练策略来评估这种新模式对评论生成的影响。我们的评估显示，结合这种模式的模型与仅代码基线相比具有竞争力。值得注意的是，在没有发生可观察数据转换的情况下，通过自动和人工评估指标来衡量，符号\(\langle \mathsf {no\_diff} \rangle\)信号的存在可以提高鲁棒性和更高的评论质量。然而，在语义丰富的场景中，我们没有观察到评论质量的改善，这为未来的研究方向提供了可能的改进路径。对生成的注释进行定性分析支持这种模式，表明在中性情况下，该模式通过减少不必要的或推测性的细节来帮助稳定注释，但是在发生有意义的数据转换时，它还不能提供一致的指导。这些趋势在更大、更嘈杂的扩展测试集中不太明显，这表明对注释代码对齐的敏感性。我们的研究证明了在自动评论生成中使用执行衍生反馈作为补充信号的可行性和潜力。虽然目前的方法受到数据集大小和模态噪声的限制，但它表明，执行后的状态更改可以指导更多的上下文感知和稳定的代码摘要。这为执行敏感模型在协助以数据为中心的软件开发及其文档方面提供了一个有希望的方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Automated Software Engineering 工程技术-计算机：软件工程

CiteScore

4.80

自引率

11.80%

发文量

审稿时长

>12 weeks

期刊介绍： This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes. Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.