{"title":"自然语言漏洞报告的半自动标注","authors":"Yan Wu, R. Gandhi, Harvey P. Siy","doi":"10.4018/JSSE.2013070102","DOIUrl":null,"url":null,"abstract":"Those who do not learn from past vulnerabilities are bound to repeat it. Consequently, there have been several research efforts to enumerate and categorize software weaknesses that lead to vulnerabilities. The Common Weakness Enumeration CWE is a community developed dictionary of software weakness types and their relationships, designed to consolidate these efforts. Yet, aggregating and classifying natural language vulnerability reports with respect to weakness standards is currently a painstaking manual effort. In this paper, the authors present a semi-automated process for annotating vulnerability information with semantic concepts that are traceable to CWE identifiers. The authors present an information-processing pipeline to parse natural language vulnerability reports. The resulting terms are used for learning the syntactic cues in these reports that are indicators for corresponding standard weakness definitions. Finally, the results of multiple machine learning algorithms are compared individually as well as collectively to semi-automatically annotate new vulnerability reports.","PeriodicalId":89158,"journal":{"name":"International journal of secure software engineering","volume":"472 1","pages":"18-41"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Semi-Automatic Annotation of Natural Language Vulnerability Reports\",\"authors\":\"Yan Wu, R. Gandhi, Harvey P. Siy\",\"doi\":\"10.4018/JSSE.2013070102\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Those who do not learn from past vulnerabilities are bound to repeat it. Consequently, there have been several research efforts to enumerate and categorize software weaknesses that lead to vulnerabilities. The Common Weakness Enumeration CWE is a community developed dictionary of software weakness types and their relationships, designed to consolidate these efforts. Yet, aggregating and classifying natural language vulnerability reports with respect to weakness standards is currently a painstaking manual effort. In this paper, the authors present a semi-automated process for annotating vulnerability information with semantic concepts that are traceable to CWE identifiers. The authors present an information-processing pipeline to parse natural language vulnerability reports. The resulting terms are used for learning the syntactic cues in these reports that are indicators for corresponding standard weakness definitions. Finally, the results of multiple machine learning algorithms are compared individually as well as collectively to semi-automatically annotate new vulnerability reports.\",\"PeriodicalId\":89158,\"journal\":{\"name\":\"International journal of secure software engineering\",\"volume\":\"472 1\",\"pages\":\"18-41\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of secure software engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4018/JSSE.2013070102\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of secure software engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/JSSE.2013070102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Semi-Automatic Annotation of Natural Language Vulnerability Reports
Those who do not learn from past vulnerabilities are bound to repeat it. Consequently, there have been several research efforts to enumerate and categorize software weaknesses that lead to vulnerabilities. The Common Weakness Enumeration CWE is a community developed dictionary of software weakness types and their relationships, designed to consolidate these efforts. Yet, aggregating and classifying natural language vulnerability reports with respect to weakness standards is currently a painstaking manual effort. In this paper, the authors present a semi-automated process for annotating vulnerability information with semantic concepts that are traceable to CWE identifiers. The authors present an information-processing pipeline to parse natural language vulnerability reports. The resulting terms are used for learning the syntactic cues in these reports that are indicators for corresponding standard weakness definitions. Finally, the results of multiple machine learning algorithms are compared individually as well as collectively to semi-automatically annotate new vulnerability reports.