{"title":"改进CVE漏洞分类的特征工程技术评价","authors":"Mounesh Marali, D. R, Narendran Rajagopalan","doi":"10.47392/irjash.2023.s026","DOIUrl":null,"url":null,"abstract":"This paper presents a three-stage approach to analyzing Common Vulnerabilities and Exposures (CVE) vulnerability datasets using machine learning techniques. In the first stage, K-Means clustering, and Linear discriminant analysis (LDA) topic modeling are applied to identify distinct clusters and topics within the dataset. The Elbow method is used to determine the optimal number of clusters for K-Means, while Grid Search is used to find the best topic model for LDA. After labeling 100 random samples from each cluster, the data is split into training and testing sets for use in various classification algorithms in the third stage. The paper contributes to the field by proposing a novel approach to analyzing CVE vulnerability datasets that combines clustering and classification techniques. The use of K-Means clustering and LDA topic modeling allows for the identification of distinct clusters and topics within the dataset, which can be used to improve the accuracy of classification algorithms. The study highlights the importance of using pre-trained word embeddings and dis-cusses the limitations of the proposed approach. Overall, the paper provides valuable insights into the analysis of CVE vulnerability datasets and","PeriodicalId":244861,"journal":{"name":"International Research Journal on Advanced Science Hub","volume":"86 12","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluation of Feature Engineering Techniques for Improving CVE Vulnerability Classification\",\"authors\":\"Mounesh Marali, D. R, Narendran Rajagopalan\",\"doi\":\"10.47392/irjash.2023.s026\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a three-stage approach to analyzing Common Vulnerabilities and Exposures (CVE) vulnerability datasets using machine learning techniques. In the first stage, K-Means clustering, and Linear discriminant analysis (LDA) topic modeling are applied to identify distinct clusters and topics within the dataset. The Elbow method is used to determine the optimal number of clusters for K-Means, while Grid Search is used to find the best topic model for LDA. After labeling 100 random samples from each cluster, the data is split into training and testing sets for use in various classification algorithms in the third stage. The paper contributes to the field by proposing a novel approach to analyzing CVE vulnerability datasets that combines clustering and classification techniques. The use of K-Means clustering and LDA topic modeling allows for the identification of distinct clusters and topics within the dataset, which can be used to improve the accuracy of classification algorithms. The study highlights the importance of using pre-trained word embeddings and dis-cusses the limitations of the proposed approach. Overall, the paper provides valuable insights into the analysis of CVE vulnerability datasets and\",\"PeriodicalId\":244861,\"journal\":{\"name\":\"International Research Journal on Advanced Science Hub\",\"volume\":\"86 12\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Research Journal on Advanced Science Hub\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.47392/irjash.2023.s026\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Research Journal on Advanced Science Hub","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47392/irjash.2023.s026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Evaluation of Feature Engineering Techniques for Improving CVE Vulnerability Classification
This paper presents a three-stage approach to analyzing Common Vulnerabilities and Exposures (CVE) vulnerability datasets using machine learning techniques. In the first stage, K-Means clustering, and Linear discriminant analysis (LDA) topic modeling are applied to identify distinct clusters and topics within the dataset. The Elbow method is used to determine the optimal number of clusters for K-Means, while Grid Search is used to find the best topic model for LDA. After labeling 100 random samples from each cluster, the data is split into training and testing sets for use in various classification algorithms in the third stage. The paper contributes to the field by proposing a novel approach to analyzing CVE vulnerability datasets that combines clustering and classification techniques. The use of K-Means clustering and LDA topic modeling allows for the identification of distinct clusters and topics within the dataset, which can be used to improve the accuracy of classification algorithms. The study highlights the importance of using pre-trained word embeddings and dis-cusses the limitations of the proposed approach. Overall, the paper provides valuable insights into the analysis of CVE vulnerability datasets and