Paulo M. M. R. Alves, Geraldo P. R. Filho, Vinícius P. Gonçalves
{"title":"Leveraging BERT's Power to Classify TTP from Unstructured Text","authors":"Paulo M. M. R. Alves, Geraldo P. R. Filho, Vinícius P. Gonçalves","doi":"10.1109/WCNPS56355.2022.9969697","DOIUrl":null,"url":null,"abstract":"Tactics, Techniques and Procedures (TTP) are valuable information to cyber-security analysts. However, they are mostly disseminated through unstructured text. This work presents a proposal for tackling this problem by using BERT models, a state-of-the-art approach in Natural Language Processing. We investigate the effect of some chosen hyperparameters on the fine-tuning of the models. MITRE's example sentences are used to train (fine-tuning step) eleven BERT models. The purpose is to find the best model and the finest combination of hyperparameters for the task of classifying TTPs according to the ATT&CK framework. As a result, we observed that the best models presented an accuracy of 82.64% and 78.75% on two datasets tested, demonstrating the potential of the application of BERT models in the complex task of TTP classification. At last, we gather some insights from the misclassified data that help better understand the dataset and how the models manage and classify the proposed data.","PeriodicalId":120276,"journal":{"name":"2022 Workshop on Communication Networks and Power Systems (WCNPS)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Workshop on Communication Networks and Power Systems (WCNPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WCNPS56355.2022.9969697","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Tactics, Techniques and Procedures (TTP) are valuable information to cyber-security analysts. However, they are mostly disseminated through unstructured text. This work presents a proposal for tackling this problem by using BERT models, a state-of-the-art approach in Natural Language Processing. We investigate the effect of some chosen hyperparameters on the fine-tuning of the models. MITRE's example sentences are used to train (fine-tuning step) eleven BERT models. The purpose is to find the best model and the finest combination of hyperparameters for the task of classifying TTPs according to the ATT&CK framework. As a result, we observed that the best models presented an accuracy of 82.64% and 78.75% on two datasets tested, demonstrating the potential of the application of BERT models in the complex task of TTP classification. At last, we gather some insights from the misclassified data that help better understand the dataset and how the models manage and classify the proposed data.