{"title":"使用小数据集在源代码中查找bug的跨域元学习","authors":"Jongho Shin","doi":"10.1145/3424954.3424957","DOIUrl":null,"url":null,"abstract":"In terms of application security, detecting security vulnerabilities in prior and fixing them is one of the effective ways to prevent malicious activities. However, finding security bugs is highly reliant upon human experts due to its complexity. Therefore, source code auditing, one of the ways to find bugs, costs a lot, and the quality of auditing quite varies according to the performer. There have been many attempts to make automated systems for code auditing, but they have been suffered from huge false positives and false negatives. Meanwhile, machine learning technology is advancing dramatically in recent years, and it is outperforming humans in many tasks with high accuracy. Thus there have been lots of efforts to accommodate machine learning technology for security research. Most of the time, however, it is very difficult to obtain legitimate training data, and rarer often means more lethal in security. Therefore it is not easy to build reliable machine learning systems for security defects, and we are highly relying on human experts who can learn easily by a few examples. To overcome the obstacle, this paper proposes a deep neural network model for finding security bugs, which takes advantages of the recent developments in the machine learning technology; the language model adapted sub-word tokenization and self-attention based transformer from natural language processing for source code understanding, and a meta-learning technique from computer vision to overcome lack of legitimate vulnerability samples for the deep learning model. The model is also evaluated for finding DOM-based XSS bugs which is prevalent but hard to spot with traditional detection methods. The result shows that the model outperforms the baseline by 45% in the F1 score.","PeriodicalId":166844,"journal":{"name":"Proceedings of the 2020 European Interdisciplinary Cybersecurity Conference","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Cross-domain meta-learning for bug finding in the source codes with a small dataset\",\"authors\":\"Jongho Shin\",\"doi\":\"10.1145/3424954.3424957\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In terms of application security, detecting security vulnerabilities in prior and fixing them is one of the effective ways to prevent malicious activities. However, finding security bugs is highly reliant upon human experts due to its complexity. Therefore, source code auditing, one of the ways to find bugs, costs a lot, and the quality of auditing quite varies according to the performer. There have been many attempts to make automated systems for code auditing, but they have been suffered from huge false positives and false negatives. Meanwhile, machine learning technology is advancing dramatically in recent years, and it is outperforming humans in many tasks with high accuracy. Thus there have been lots of efforts to accommodate machine learning technology for security research. Most of the time, however, it is very difficult to obtain legitimate training data, and rarer often means more lethal in security. Therefore it is not easy to build reliable machine learning systems for security defects, and we are highly relying on human experts who can learn easily by a few examples. To overcome the obstacle, this paper proposes a deep neural network model for finding security bugs, which takes advantages of the recent developments in the machine learning technology; the language model adapted sub-word tokenization and self-attention based transformer from natural language processing for source code understanding, and a meta-learning technique from computer vision to overcome lack of legitimate vulnerability samples for the deep learning model. The model is also evaluated for finding DOM-based XSS bugs which is prevalent but hard to spot with traditional detection methods. The result shows that the model outperforms the baseline by 45% in the F1 score.\",\"PeriodicalId\":166844,\"journal\":{\"name\":\"Proceedings of the 2020 European Interdisciplinary Cybersecurity Conference\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2020 European Interdisciplinary Cybersecurity Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3424954.3424957\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 European Interdisciplinary Cybersecurity Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3424954.3424957","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Cross-domain meta-learning for bug finding in the source codes with a small dataset
In terms of application security, detecting security vulnerabilities in prior and fixing them is one of the effective ways to prevent malicious activities. However, finding security bugs is highly reliant upon human experts due to its complexity. Therefore, source code auditing, one of the ways to find bugs, costs a lot, and the quality of auditing quite varies according to the performer. There have been many attempts to make automated systems for code auditing, but they have been suffered from huge false positives and false negatives. Meanwhile, machine learning technology is advancing dramatically in recent years, and it is outperforming humans in many tasks with high accuracy. Thus there have been lots of efforts to accommodate machine learning technology for security research. Most of the time, however, it is very difficult to obtain legitimate training data, and rarer often means more lethal in security. Therefore it is not easy to build reliable machine learning systems for security defects, and we are highly relying on human experts who can learn easily by a few examples. To overcome the obstacle, this paper proposes a deep neural network model for finding security bugs, which takes advantages of the recent developments in the machine learning technology; the language model adapted sub-word tokenization and self-attention based transformer from natural language processing for source code understanding, and a meta-learning technique from computer vision to overcome lack of legitimate vulnerability samples for the deep learning model. The model is also evaluated for finding DOM-based XSS bugs which is prevalent but hard to spot with traditional detection methods. The result shows that the model outperforms the baseline by 45% in the F1 score.