Hong Quy Nguyen , Thong Hoang , Hoa Khanh Dam , Guoxin Su , Zhenchang Xing , Qinghua Lu , Jiamou Sun
{"title":"对软件漏洞预测的人类理解解释","authors":"Hong Quy Nguyen , Thong Hoang , Hoa Khanh Dam , Guoxin Su , Zhenchang Xing , Qinghua Lu , Jiamou Sun","doi":"10.1016/j.jss.2025.112455","DOIUrl":null,"url":null,"abstract":"<div><div>Recent advances in deep learning have significantly improved the performance of software vulnerability prediction (SVP). To enhance trustworthiness, the SVP highlights predicted lines of code (LoC) that may be vulnerable. However, providing LoC alone is often insufficient for software practitioners, as it lacks detailed information about the nature of the vulnerability. This paper introduces a novel framework that is built on SVP by offering additional explanatory information based on the suggested LoC. Similar to security reports, our framework comprehensively explains the vulnerability aspects, such as Root Cause, Impact, Attack Vector, and Vulnerability Type. The proposed framework is powered by transformer architectures. Specifically, we leverage pre-trained language models for code to fine-tune on two practical datasets: BigVul and Vulnerability Key Aspect, ensuring our framework’s applicability to real-world scenarios. Experiments using the ROUGE and BLEU scores as evaluation metrics show that our framework achieves better performance with CodeT5+, statistically outperforming a baseline study in generating key vulnerability aspects. Additionally, we conducted a small-scale user study with experienced software practitioners to assess the effectiveness of the framework. The results show that 72% of the participants found our framework helpful in accepting the SVP results, and 68% rated the additional explanations as moderately to extremely useful.</div><div><em>Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board</em>.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"228 ","pages":"Article 112455"},"PeriodicalIF":3.7000,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Human-understandable explanation for software vulnerability prediction\",\"authors\":\"Hong Quy Nguyen , Thong Hoang , Hoa Khanh Dam , Guoxin Su , Zhenchang Xing , Qinghua Lu , Jiamou Sun\",\"doi\":\"10.1016/j.jss.2025.112455\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Recent advances in deep learning have significantly improved the performance of software vulnerability prediction (SVP). To enhance trustworthiness, the SVP highlights predicted lines of code (LoC) that may be vulnerable. However, providing LoC alone is often insufficient for software practitioners, as it lacks detailed information about the nature of the vulnerability. This paper introduces a novel framework that is built on SVP by offering additional explanatory information based on the suggested LoC. Similar to security reports, our framework comprehensively explains the vulnerability aspects, such as Root Cause, Impact, Attack Vector, and Vulnerability Type. The proposed framework is powered by transformer architectures. Specifically, we leverage pre-trained language models for code to fine-tune on two practical datasets: BigVul and Vulnerability Key Aspect, ensuring our framework’s applicability to real-world scenarios. Experiments using the ROUGE and BLEU scores as evaluation metrics show that our framework achieves better performance with CodeT5+, statistically outperforming a baseline study in generating key vulnerability aspects. Additionally, we conducted a small-scale user study with experienced software practitioners to assess the effectiveness of the framework. The results show that 72% of the participants found our framework helpful in accepting the SVP results, and 68% rated the additional explanations as moderately to extremely useful.</div><div><em>Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board</em>.</div></div>\",\"PeriodicalId\":51099,\"journal\":{\"name\":\"Journal of Systems and Software\",\"volume\":\"228 \",\"pages\":\"Article 112455\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-04-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems and Software\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0164121225001232\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121225001232","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
Human-understandable explanation for software vulnerability prediction
Recent advances in deep learning have significantly improved the performance of software vulnerability prediction (SVP). To enhance trustworthiness, the SVP highlights predicted lines of code (LoC) that may be vulnerable. However, providing LoC alone is often insufficient for software practitioners, as it lacks detailed information about the nature of the vulnerability. This paper introduces a novel framework that is built on SVP by offering additional explanatory information based on the suggested LoC. Similar to security reports, our framework comprehensively explains the vulnerability aspects, such as Root Cause, Impact, Attack Vector, and Vulnerability Type. The proposed framework is powered by transformer architectures. Specifically, we leverage pre-trained language models for code to fine-tune on two practical datasets: BigVul and Vulnerability Key Aspect, ensuring our framework’s applicability to real-world scenarios. Experiments using the ROUGE and BLEU scores as evaluation metrics show that our framework achieves better performance with CodeT5+, statistically outperforming a baseline study in generating key vulnerability aspects. Additionally, we conducted a small-scale user study with experienced software practitioners to assess the effectiveness of the framework. The results show that 72% of the participants found our framework helpful in accepting the SVP results, and 68% rated the additional explanations as moderately to extremely useful.
Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board.
期刊介绍:
The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to:
•Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution
•Agile, model-driven, service-oriented, open source and global software development
•Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems
•Human factors and management concerns of software development
•Data management and big data issues of software systems
•Metrics and evaluation, data mining of software development resources
•Business and economic aspects of software development processes
The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.