{"title":"A systematic mapping study on graph machine learning for static source code analysis","authors":"Jesse Maarleveld, Jiapan Guo, Daniel Feitosa","doi":"10.1016/j.infsof.2025.107722","DOIUrl":null,"url":null,"abstract":"<div><div><strong>Context:</strong> In recent years, graph machine learning and particularly graph neural networks have seen successful and widespread applications in many fields, including static source code analysis. Such machine learning techniques enable learning on rich information networks capable of representing different relations and entities.</div><div>However, there have been no comprehensive studies investigating the use of graph machine learning for static source code analysis. There is no complete systematic picture of what techniques may be considered tried and tested, and where opportunities for future improvements can still be found.</div></div><div><h3>Objective:</h3><div>The main goal of this study is to provide a broad overview of the state of the art of static source code analysis using graph machine learning.</div></div><div><h3>Methods:</h3><div>A systematic mapping was performed covering 4499 studies, presenting a final selection of 323 primary studies.</div></div><div><h3>Results:</h3><div>Among the selected studies, seven major sub-domains were identified. The use and combinations of artefacts, different graph representations, different features, and different machine learning models used were collected and categorised.</div></div><div><h3>Conclusions:</h3><div>The use of graph learning, and in particular graph neural networks, has increased significantly since 2018. Although a wide variety of methods is used, across every dimension we investigated (artefacts, graphs, features, models), we found small sets of technologies which are used in the vast majority of studies. Future opportunities lie in exploring under-explored domains more thoroughly, exploring the use of additional artefacts alongside source code, and paying more attention to interpretability and explainability.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"183 ","pages":"Article 107722"},"PeriodicalIF":3.8000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925000618","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Context: In recent years, graph machine learning and particularly graph neural networks have seen successful and widespread applications in many fields, including static source code analysis. Such machine learning techniques enable learning on rich information networks capable of representing different relations and entities.
However, there have been no comprehensive studies investigating the use of graph machine learning for static source code analysis. There is no complete systematic picture of what techniques may be considered tried and tested, and where opportunities for future improvements can still be found.
Objective:
The main goal of this study is to provide a broad overview of the state of the art of static source code analysis using graph machine learning.
Methods:
A systematic mapping was performed covering 4499 studies, presenting a final selection of 323 primary studies.
Results:
Among the selected studies, seven major sub-domains were identified. The use and combinations of artefacts, different graph representations, different features, and different machine learning models used were collected and categorised.
Conclusions:
The use of graph learning, and in particular graph neural networks, has increased significantly since 2018. Although a wide variety of methods is used, across every dimension we investigated (artefacts, graphs, features, models), we found small sets of technologies which are used in the vast majority of studies. Future opportunities lie in exploring under-explored domains more thoroughly, exploring the use of additional artefacts alongside source code, and paying more attention to interpretability and explainability.
期刊介绍:
Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include:
• Software management, quality and metrics,
• Software processes,
• Software architecture, modelling, specification, design and programming
• Functional and non-functional software requirements
• Software testing and verification & validation
• Empirical studies of all aspects of engineering and managing software development
Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information.
The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.