{"title":"基于图表示学习和软件同源匹配的JAVA代码漏洞检测技术研究","authors":"Yibin Yang, Xin Bo, Zitong Wang, Xinrui Shao, Xinjie Xie","doi":"10.1145/3590003.3590028","DOIUrl":null,"url":null,"abstract":"In nowadays using different tools and apps is a basic need of people's behavior in life, but the security issues arising from the existence of source code plagiarism among tools and apps are likely to bring huge losses to companies and even countries, so detecting the existence of vulnerabilities or malicious code in software becomes an important part of protecting information and detecting software security. This project is based on the aspect of JAVA code vulnerability detection based on graph representation learning and software homology comparison to carry out research. This project will be based on the content of deep learning, using a large number of vulnerable source code, extracting its features, and transforming it into a graph so that it can be tested source code for comparison and report the vulnerability content. The main work and results of this project are as follows: 1.By extracting the example dataset and generating json files to save the feature information of relevant java code; by generating vector files, bytecode files and dot files, and batch extracting nodes and edges based on the contents of the dot files for subsequent machine learning use, the before and after steps and operations form a logical self-consistency to ensure the integrity of the project. 2.Through the study of graph neural networks and graph convolutional neural networks, relevant models are selected for machine learning using predecessor files and manual model tuning is performed to ensure good learning results and feedback for the machine learning part of the project. 3.This project training dataset negative samples for sard above the shared dataset, which contains 46636 java vulnerability source code, and dataset support environment, test dataset negative samples dataset also from sard, positive samples dataset are generated from the relevant person in charge. 4.Based on Graph Neural Network (GNN) and Graph Convolutional Neural Network (GCN), this project will design and implement a whole set of automated vulnerability detection system for java code. 5. All the related contents of this project, after the human extensive search of domestic and foreign related papers and materials, there are not all projects or contents similar to this project, the same papers and materials appear, all the problems involved in this project and related ideas are for the project this group of people thinking, looking for solutions.","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Graph representation learning and software homology matching based A study of JAVA code vulnerability detection techniques\",\"authors\":\"Yibin Yang, Xin Bo, Zitong Wang, Xinrui Shao, Xinjie Xie\",\"doi\":\"10.1145/3590003.3590028\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In nowadays using different tools and apps is a basic need of people's behavior in life, but the security issues arising from the existence of source code plagiarism among tools and apps are likely to bring huge losses to companies and even countries, so detecting the existence of vulnerabilities or malicious code in software becomes an important part of protecting information and detecting software security. This project is based on the aspect of JAVA code vulnerability detection based on graph representation learning and software homology comparison to carry out research. This project will be based on the content of deep learning, using a large number of vulnerable source code, extracting its features, and transforming it into a graph so that it can be tested source code for comparison and report the vulnerability content. The main work and results of this project are as follows: 1.By extracting the example dataset and generating json files to save the feature information of relevant java code; by generating vector files, bytecode files and dot files, and batch extracting nodes and edges based on the contents of the dot files for subsequent machine learning use, the before and after steps and operations form a logical self-consistency to ensure the integrity of the project. 2.Through the study of graph neural networks and graph convolutional neural networks, relevant models are selected for machine learning using predecessor files and manual model tuning is performed to ensure good learning results and feedback for the machine learning part of the project. 3.This project training dataset negative samples for sard above the shared dataset, which contains 46636 java vulnerability source code, and dataset support environment, test dataset negative samples dataset also from sard, positive samples dataset are generated from the relevant person in charge. 4.Based on Graph Neural Network (GNN) and Graph Convolutional Neural Network (GCN), this project will design and implement a whole set of automated vulnerability detection system for java code. 5. All the related contents of this project, after the human extensive search of domestic and foreign related papers and materials, there are not all projects or contents similar to this project, the same papers and materials appear, all the problems involved in this project and related ideas are for the project this group of people thinking, looking for solutions.\",\"PeriodicalId\":340225,\"journal\":{\"name\":\"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3590003.3590028\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3590003.3590028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Graph representation learning and software homology matching based A study of JAVA code vulnerability detection techniques
In nowadays using different tools and apps is a basic need of people's behavior in life, but the security issues arising from the existence of source code plagiarism among tools and apps are likely to bring huge losses to companies and even countries, so detecting the existence of vulnerabilities or malicious code in software becomes an important part of protecting information and detecting software security. This project is based on the aspect of JAVA code vulnerability detection based on graph representation learning and software homology comparison to carry out research. This project will be based on the content of deep learning, using a large number of vulnerable source code, extracting its features, and transforming it into a graph so that it can be tested source code for comparison and report the vulnerability content. The main work and results of this project are as follows: 1.By extracting the example dataset and generating json files to save the feature information of relevant java code; by generating vector files, bytecode files and dot files, and batch extracting nodes and edges based on the contents of the dot files for subsequent machine learning use, the before and after steps and operations form a logical self-consistency to ensure the integrity of the project. 2.Through the study of graph neural networks and graph convolutional neural networks, relevant models are selected for machine learning using predecessor files and manual model tuning is performed to ensure good learning results and feedback for the machine learning part of the project. 3.This project training dataset negative samples for sard above the shared dataset, which contains 46636 java vulnerability source code, and dataset support environment, test dataset negative samples dataset also from sard, positive samples dataset are generated from the relevant person in charge. 4.Based on Graph Neural Network (GNN) and Graph Convolutional Neural Network (GCN), this project will design and implement a whole set of automated vulnerability detection system for java code. 5. All the related contents of this project, after the human extensive search of domestic and foreign related papers and materials, there are not all projects or contents similar to this project, the same papers and materials appear, all the problems involved in this project and related ideas are for the project this group of people thinking, looking for solutions.