LDA在编校文件安全分析中的应用评价

2023 IEEE International Conference on Cyber Security and Resilience (CSR) Pub Date : 2023-07-31 DOI:10.1109/csr57506.2023.10224991

K. Umezawa, Sven Wohlgemuth, Keisuke Hasegawa, K. Takaragi

{"title":"LDA在编校文件安全分析中的应用评价","authors":"K. Umezawa, Sven Wohlgemuth, Keisuke Hasegawa, K. Takaragi","doi":"10.1109/csr57506.2023.10224991","DOIUrl":null,"url":null,"abstract":"Cyber attacks are often executed by imitating existing attacks and combining them. Using existing vulnerability databases, we have presented a way to semi-automatically determine the presence of vulnerabilities in the design documents of products under development. We have calculated the similarity between documents using the Latent Dirichlet Allocation (LDA) technology and compared the design document of the new product with the vulnerability database. When this comparison processing is conducted by a third party as a service, it may be desirable to not inadvertently disclose a part of the design document of the new product to the third party. In this study, we used the LDA technique to experimentally verify that the calculated similarity value does not deteriorate even when a portion of the design document is encrypted or obfuscated. In conclusion, we discovered no substantial difference in similarity with the original document; however, there are changes in numerical values depending on the words to be encrypted/obfuscated. In particular, the degradation of similarity is very small when the version number is encrypted/obfuscated.","PeriodicalId":354918,"journal":{"name":"2023 IEEE International Conference on Cyber Security and Resilience (CSR)","volume":"207 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluation of Applying LDA to Redacted Documents in Security and Safety Analysis\",\"authors\":\"K. Umezawa, Sven Wohlgemuth, Keisuke Hasegawa, K. Takaragi\",\"doi\":\"10.1109/csr57506.2023.10224991\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cyber attacks are often executed by imitating existing attacks and combining them. Using existing vulnerability databases, we have presented a way to semi-automatically determine the presence of vulnerabilities in the design documents of products under development. We have calculated the similarity between documents using the Latent Dirichlet Allocation (LDA) technology and compared the design document of the new product with the vulnerability database. When this comparison processing is conducted by a third party as a service, it may be desirable to not inadvertently disclose a part of the design document of the new product to the third party. In this study, we used the LDA technique to experimentally verify that the calculated similarity value does not deteriorate even when a portion of the design document is encrypted or obfuscated. In conclusion, we discovered no substantial difference in similarity with the original document; however, there are changes in numerical values depending on the words to be encrypted/obfuscated. In particular, the degradation of similarity is very small when the version number is encrypted/obfuscated.\",\"PeriodicalId\":354918,\"journal\":{\"name\":\"2023 IEEE International Conference on Cyber Security and Resilience (CSR)\",\"volume\":\"207 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE International Conference on Cyber Security and Resilience (CSR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/csr57506.2023.10224991\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Cyber Security and Resilience (CSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/csr57506.2023.10224991","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

网络攻击通常是通过模仿现有的攻击并将它们结合起来来执行的。利用现有的漏洞数据库，我们提出了一种半自动地确定正在开发的产品的设计文档中是否存在漏洞的方法。我们利用潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)技术计算了文档之间的相似度，并将新产品的设计文档与漏洞数据库进行了比较。当这种比较处理作为一种服务由第三方进行时，最好不要无意中将新产品的部分设计文档泄露给第三方。在这项研究中，我们使用LDA技术来实验验证，即使设计文档的一部分被加密或混淆，计算的相似性值也不会恶化。总之，我们发现与原始文件的相似度没有实质性差异;但是，根据要加密/混淆的单词，数值会发生变化。特别是，当版本号被加密/混淆时，相似度的降低非常小。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluation of Applying LDA to Redacted Documents in Security and Safety Analysis

Cyber attacks are often executed by imitating existing attacks and combining them. Using existing vulnerability databases, we have presented a way to semi-automatically determine the presence of vulnerabilities in the design documents of products under development. We have calculated the similarity between documents using the Latent Dirichlet Allocation (LDA) technology and compared the design document of the new product with the vulnerability database. When this comparison processing is conducted by a third party as a service, it may be desirable to not inadvertently disclose a part of the design document of the new product to the third party. In this study, we used the LDA technique to experimentally verify that the calculated similarity value does not deteriorate even when a portion of the design document is encrypted or obfuscated. In conclusion, we discovered no substantial difference in similarity with the original document; however, there are changes in numerical values depending on the words to be encrypted/obfuscated. In particular, the degradation of similarity is very small when the version number is encrypted/obfuscated.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE International Conference on Cyber Security and Resilience (CSR)

自引率

0.00%

发文量