Graph representation learning and software homology matching based A study of JAVA code vulnerability detection techniques

Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning Pub Date : 2023-03-17 DOI:10.1145/3590003.3590028

Yibin Yang, Xin Bo, Zitong Wang, Xinrui Shao, Xinjie Xie

{"title":"Graph representation learning and software homology matching based A study of JAVA code vulnerability detection techniques","authors":"Yibin Yang, Xin Bo, Zitong Wang, Xinrui Shao, Xinjie Xie","doi":"10.1145/3590003.3590028","DOIUrl":null,"url":null,"abstract":"In nowadays using different tools and apps is a basic need of people's behavior in life, but the security issues arising from the existence of source code plagiarism among tools and apps are likely to bring huge losses to companies and even countries, so detecting the existence of vulnerabilities or malicious code in software becomes an important part of protecting information and detecting software security. This project is based on the aspect of JAVA code vulnerability detection based on graph representation learning and software homology comparison to carry out research. This project will be based on the content of deep learning, using a large number of vulnerable source code, extracting its features, and transforming it into a graph so that it can be tested source code for comparison and report the vulnerability content. The main work and results of this project are as follows: 1.By extracting the example dataset and generating json files to save the feature information of relevant java code; by generating vector files, bytecode files and dot files, and batch extracting nodes and edges based on the contents of the dot files for subsequent machine learning use, the before and after steps and operations form a logical self-consistency to ensure the integrity of the project. 2.Through the study of graph neural networks and graph convolutional neural networks, relevant models are selected for machine learning using predecessor files and manual model tuning is performed to ensure good learning results and feedback for the machine learning part of the project. 3.This project training dataset negative samples for sard above the shared dataset, which contains 46636 java vulnerability source code, and dataset support environment, test dataset negative samples dataset also from sard, positive samples dataset are generated from the relevant person in charge. 4.Based on Graph Neural Network (GNN) and Graph Convolutional Neural Network (GCN), this project will design and implement a whole set of automated vulnerability detection system for java code. 5. All the related contents of this project, after the human extensive search of domestic and foreign related papers and materials, there are not all projects or contents similar to this project, the same papers and materials appear, all the problems involved in this project and related ideas are for the project this group of people thinking, looking for solutions.","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3590003.3590028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In nowadays using different tools and apps is a basic need of people's behavior in life, but the security issues arising from the existence of source code plagiarism among tools and apps are likely to bring huge losses to companies and even countries, so detecting the existence of vulnerabilities or malicious code in software becomes an important part of protecting information and detecting software security. This project is based on the aspect of JAVA code vulnerability detection based on graph representation learning and software homology comparison to carry out research. This project will be based on the content of deep learning, using a large number of vulnerable source code, extracting its features, and transforming it into a graph so that it can be tested source code for comparison and report the vulnerability content. The main work and results of this project are as follows: 1.By extracting the example dataset and generating json files to save the feature information of relevant java code; by generating vector files, bytecode files and dot files, and batch extracting nodes and edges based on the contents of the dot files for subsequent machine learning use, the before and after steps and operations form a logical self-consistency to ensure the integrity of the project. 2.Through the study of graph neural networks and graph convolutional neural networks, relevant models are selected for machine learning using predecessor files and manual model tuning is performed to ensure good learning results and feedback for the machine learning part of the project. 3.This project training dataset negative samples for sard above the shared dataset, which contains 46636 java vulnerability source code, and dataset support environment, test dataset negative samples dataset also from sard, positive samples dataset are generated from the relevant person in charge. 4.Based on Graph Neural Network (GNN) and Graph Convolutional Neural Network (GCN), this project will design and implement a whole set of automated vulnerability detection system for java code. 5. All the related contents of this project, after the human extensive search of domestic and foreign related papers and materials, there are not all projects or contents similar to this project, the same papers and materials appear, all the problems involved in this project and related ideas are for the project this group of people thinking, looking for solutions.

查看原文本刊更多论文

基于图表示学习和软件同源匹配的JAVA代码漏洞检测技术研究

如今，使用不同的工具和应用程序是人们生活行为的基本需求，但工具和应用程序之间存在源代码抄袭而产生的安全问题很可能给公司甚至国家带来巨大的损失，因此检测软件中是否存在漏洞或恶意代码成为保护信息和检测软件安全的重要组成部分。本项目是基于基于图表示学习的JAVA代码漏洞检测和软件同源性比较方面进行研究。本项目将以深度学习的内容为基础，利用大量的漏洞源代码，提取其特征，并将其转换成图形，以便测试源代码进行比对，并报告漏洞内容。本课题的主要工作和成果如下:1。通过提取样例数据集并生成json文件保存相关java代码的特征信息;通过生成矢量文件、字节码文件和点文件，并根据点文件的内容批量提取节点和边供后续机器学习使用，前后步骤和操作形成逻辑自一致性，保证项目的完整性。2.通过对图神经网络和图卷积神经网络的研究，利用前驱文件选择相关模型进行机器学习，并进行人工模型调优，确保项目机器学习部分有良好的学习效果和反馈。3.。本项目训练数据集的负样本为sard以上的共享数据集，其中包含46636个java漏洞源代码，并且数据集支持环境，测试数据集的负样本数据集也来自sard，正样本数据集由相关负责人生成。4.本项目将基于图神经网络(GNN)和图卷积神经网络(GCN)，设计并实现一套完整的java代码漏洞自动检测系统。5. 本项目的所有相关内容，经过对国内外相关论文和资料的人工广泛检索，并不是所有的项目或内容都与本项目相似，相同的论文和资料出现，本项目所涉及的所有问题和相关思路都是针对本项目这群人的思考，寻找解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning

自引率

0.00%

发文量