{"title":"公司VulSeeker: BERT和基于Siamese的嵌入式设备固件映像漏洞","authors":"Yingchao Yu, Shuitao Gan, Xiaojun Qin","doi":"10.1109/ISCC53001.2021.9631481","DOIUrl":null,"url":null,"abstract":"In this paper, we propose firmVulSeeker-a vulnerability search tool for embedded firmware images, based on BERT and Siamese network. It first builds a BERT MLM task to observe and learn the semantics of different instructions in their context in a very large unlabeled binary corpus. Then, a finetune mode based on Siamese network is constructed to guide training and matching semantically similar functions using the knowledge learned from the first stage. Finally, it will use a function embedding generated from the fine-tuned model to search in the targeted corpus and find the most similar function which will be confirmed whether it's a real vulnerability manually. We evaluate the accuracy, robustness, scalability and vulnerability search capability of firmVulSeeker. Results show that it can greatly improve the accuracy of matching semantically similar functions, and can successfully find more real vulnerabilities in real-world firmware than other tools.","PeriodicalId":270786,"journal":{"name":"2021 IEEE Symposium on Computers and Communications (ISCC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"firm VulSeeker: BERT and Siamese based Vulnerability for Embedded Device Firmware Images\",\"authors\":\"Yingchao Yu, Shuitao Gan, Xiaojun Qin\",\"doi\":\"10.1109/ISCC53001.2021.9631481\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose firmVulSeeker-a vulnerability search tool for embedded firmware images, based on BERT and Siamese network. It first builds a BERT MLM task to observe and learn the semantics of different instructions in their context in a very large unlabeled binary corpus. Then, a finetune mode based on Siamese network is constructed to guide training and matching semantically similar functions using the knowledge learned from the first stage. Finally, it will use a function embedding generated from the fine-tuned model to search in the targeted corpus and find the most similar function which will be confirmed whether it's a real vulnerability manually. We evaluate the accuracy, robustness, scalability and vulnerability search capability of firmVulSeeker. Results show that it can greatly improve the accuracy of matching semantically similar functions, and can successfully find more real vulnerabilities in real-world firmware than other tools.\",\"PeriodicalId\":270786,\"journal\":{\"name\":\"2021 IEEE Symposium on Computers and Communications (ISCC)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE Symposium on Computers and Communications (ISCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCC53001.2021.9631481\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Symposium on Computers and Communications (ISCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCC53001.2021.9631481","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
firm VulSeeker: BERT and Siamese based Vulnerability for Embedded Device Firmware Images
In this paper, we propose firmVulSeeker-a vulnerability search tool for embedded firmware images, based on BERT and Siamese network. It first builds a BERT MLM task to observe and learn the semantics of different instructions in their context in a very large unlabeled binary corpus. Then, a finetune mode based on Siamese network is constructed to guide training and matching semantically similar functions using the knowledge learned from the first stage. Finally, it will use a function embedding generated from the fine-tuned model to search in the targeted corpus and find the most similar function which will be confirmed whether it's a real vulnerability manually. We evaluate the accuracy, robustness, scalability and vulnerability search capability of firmVulSeeker. Results show that it can greatly improve the accuracy of matching semantically similar functions, and can successfully find more real vulnerabilities in real-world firmware than other tools.