Towards Explainable Code Readability Classification With Graph Neural Networks

IF 1.8 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Software-Evolution and Process Pub Date : 2025-09-03 DOI:10.1002/smr.70048

Qing Mi, Zhiyou Xiao, Yi Zhan, Liyan Tao, Jiahe Zhang

{"title":"Towards Explainable Code Readability Classification With Graph Neural Networks","authors":"Qing Mi, Zhiyou Xiao, Yi Zhan, Liyan Tao, Jiahe Zhang","doi":"10.1002/smr.70048","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Code readability is of central concern for developers, as a more readable code indicates higher maintainability, reusability, and portability. In recent years, many deep learning–based code readability classification methods have been proposed. Among them, a graph neural network (GNN)–based model has achieved the best performance in the field of code readability classification. However, it is still unclear what aspects of the model's input lead to its decisions, which hinders its practical use in the software industry. To improve the interpretability of existing code readability classification models and identify key code characteristics that drive their readability predictions, we propose an explanation framework with GNN explainers towards transparent and trustworthy code readability classification. First, we propose a simplified Abstract Syntax Tree (AST)–based code representation method, which transforms Java code snippets into ASTs and discards lower-level nodes with limited information. Then, we retrain the state-of-the-art GNN-based model together with our simplified program graphs. Finally, we employ SubgraphX to explain the model's code readability predictions at the subgraph level and visualize the explanation results to further analyze what causes such predictions. The experimental results show that sequential logic, code comments, selection logic, and nested structure are the most influential code characteristics when classifying code snippets as readable or unreadable. Further investigations indicate the model's proficiency in capturing features related to complex logic structures and extensive data flows but point to its limitations in identifying readability issues associated with naming conventions and code formatting. The explainability analysis conducted in this research is the first step towards more transparent and reliable code readability classification. We believe that our findings are useful in providing constructive suggestions for developers to write more readable code and delimitating directions for future model improvement.</p>\n </div>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 9","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Software-Evolution and Process","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/smr.70048","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Code readability is of central concern for developers, as a more readable code indicates higher maintainability, reusability, and portability. In recent years, many deep learning–based code readability classification methods have been proposed. Among them, a graph neural network (GNN)–based model has achieved the best performance in the field of code readability classification. However, it is still unclear what aspects of the model's input lead to its decisions, which hinders its practical use in the software industry. To improve the interpretability of existing code readability classification models and identify key code characteristics that drive their readability predictions, we propose an explanation framework with GNN explainers towards transparent and trustworthy code readability classification. First, we propose a simplified Abstract Syntax Tree (AST)–based code representation method, which transforms Java code snippets into ASTs and discards lower-level nodes with limited information. Then, we retrain the state-of-the-art GNN-based model together with our simplified program graphs. Finally, we employ SubgraphX to explain the model's code readability predictions at the subgraph level and visualize the explanation results to further analyze what causes such predictions. The experimental results show that sequential logic, code comments, selection logic, and nested structure are the most influential code characteristics when classifying code snippets as readable or unreadable. Further investigations indicate the model's proficiency in capturing features related to complex logic structures and extensive data flows but point to its limitations in identifying readability issues associated with naming conventions and code formatting. The explainability analysis conducted in this research is the first step towards more transparent and reliable code readability classification. We believe that our findings are useful in providing constructive suggestions for developers to write more readable code and delimitating directions for future model improvement.

查看原文本刊更多论文

基于图神经网络的可解释代码可读性分类

代码的可读性是开发人员最关心的问题，因为更易读的代码意味着更高的可维护性、可重用性和可移植性。近年来，人们提出了许多基于深度学习的代码可读性分类方法。其中，基于图神经网络（GNN）的模型在代码可读性分类领域取得了最好的性能。然而，目前还不清楚模型输入的哪些方面导致了它的决策，这阻碍了它在软件行业的实际应用。为了提高现有代码可读性分类模型的可解释性，并识别驱动其可读性预测的关键代码特征，我们提出了一个带有GNN解释器的解释框架，以实现透明和可信赖的代码可读性分类。首先，我们提出了一种简化的基于抽象语法树（AST）的代码表示方法，该方法将Java代码片段转换为AST，并丢弃具有有限信息的低级节点。然后，我们将最先进的基于gnn的模型与简化的程序图一起重新训练。最后，我们使用SubgraphX在子图级别解释模型的代码可读性预测，并将解释结果可视化，以进一步分析导致这些预测的原因。实验结果表明，顺序逻辑、代码注释、选择逻辑和嵌套结构是对代码片段进行可读或不可读分类时影响最大的代码特征。进一步的调查表明，该模型在捕获与复杂逻辑结构和广泛数据流相关的特征方面非常熟练，但指出了它在识别与命名约定和代码格式相关的可读性问题方面的局限性。本研究中进行的可解释性分析是迈向更加透明和可靠的代码可读性分类的第一步。我们相信我们的发现对于为开发人员编写更可读的代码和确定未来模型改进的方向提供建设性的建议是有用的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Software-Evolution and Process COMPUTER SCIENCE, SOFTWARE ENGINEERING-

自引率

10.00%

发文量

109