识别GitHub基础机器学习存储库中的漏洞发生率模式:一种无监督图嵌入方法

Agrim Sachdeva, Ben Lazarine, Ruchik Dama, S. Samtani, Hongyi Zhu
{"title":"识别GitHub基础机器学习存储库中的漏洞发生率模式:一种无监督图嵌入方法","authors":"Agrim Sachdeva, Ben Lazarine, Ruchik Dama, S. Samtani, Hongyi Zhu","doi":"10.1109/ICDMW58026.2022.00084","DOIUrl":null,"url":null,"abstract":"The rapid pace of the development of artificial intelligence (AI) solutions is enabled by leveraging foundational tools and frameworks that allow AI developers to focus on application logic and rapid prototyping. However, the security vulnerabilities present in foundation repositories might cause irreparable damage due to the AI solutions built using these libraries being deployed in production environments. Our research leverages source code hosted on the prevailing social coding platform GitHub to identify vulnerabilities in foundational repositories commonly used for modern AI development (Linux, BERT, PyTorch, and Transformers), as well as the AI repositories that utilize foundation repositories as dependencies. Using an unsupervised graph embedding approach, we generate graph embeddings that capture vulnerability information and the relationships between repositories. Based on these embeddings, we performed clustering as our downstream task to group similarly vulnerable repositories. Our research identifies patterns and similarities between repositories and will help develop effective mitigation of vulnerabilities present in groups of repositories based on foundational AI repositories. We also discuss the implications of identifying such clusters of vulnerable repositories.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Identifying Patterns of Vulnerability Incidence in Foundational Machine Learning Repositories on GitHub: An Unsupervised Graph Embedding Approach\",\"authors\":\"Agrim Sachdeva, Ben Lazarine, Ruchik Dama, S. Samtani, Hongyi Zhu\",\"doi\":\"10.1109/ICDMW58026.2022.00084\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The rapid pace of the development of artificial intelligence (AI) solutions is enabled by leveraging foundational tools and frameworks that allow AI developers to focus on application logic and rapid prototyping. However, the security vulnerabilities present in foundation repositories might cause irreparable damage due to the AI solutions built using these libraries being deployed in production environments. Our research leverages source code hosted on the prevailing social coding platform GitHub to identify vulnerabilities in foundational repositories commonly used for modern AI development (Linux, BERT, PyTorch, and Transformers), as well as the AI repositories that utilize foundation repositories as dependencies. Using an unsupervised graph embedding approach, we generate graph embeddings that capture vulnerability information and the relationships between repositories. Based on these embeddings, we performed clustering as our downstream task to group similarly vulnerable repositories. Our research identifies patterns and similarities between repositories and will help develop effective mitigation of vulnerabilities present in groups of repositories based on foundational AI repositories. We also discuss the implications of identifying such clusters of vulnerable repositories.\",\"PeriodicalId\":146687,\"journal\":{\"name\":\"2022 IEEE International Conference on Data Mining Workshops (ICDMW)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Data Mining Workshops (ICDMW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDMW58026.2022.00084\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW58026.2022.00084","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

人工智能(AI)解决方案的快速发展是通过利用基础工具和框架来实现的,这些工具和框架允许AI开发人员专注于应用程序逻辑和快速原型。然而,由于在生产环境中部署使用这些库构建的AI解决方案,基础存储库中存在的安全漏洞可能会造成无法弥补的损害。我们的研究利用了托管在主流社交编码平台GitHub上的源代码来识别现代人工智能开发常用的基础存储库(Linux、BERT、PyTorch和Transformers)中的漏洞,以及利用基础存储库作为依赖的人工智能存储库。使用无监督图嵌入方法,我们生成捕获漏洞信息和存储库之间关系的图嵌入。基于这些嵌入,我们执行集群作为下游任务,对类似的易受攻击的存储库进行分组。我们的研究确定了存储库之间的模式和相似性,并将有助于有效缓解基于基础AI存储库的存储库组中存在的漏洞。我们还讨论了识别易受攻击存储库集群的含义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Identifying Patterns of Vulnerability Incidence in Foundational Machine Learning Repositories on GitHub: An Unsupervised Graph Embedding Approach
The rapid pace of the development of artificial intelligence (AI) solutions is enabled by leveraging foundational tools and frameworks that allow AI developers to focus on application logic and rapid prototyping. However, the security vulnerabilities present in foundation repositories might cause irreparable damage due to the AI solutions built using these libraries being deployed in production environments. Our research leverages source code hosted on the prevailing social coding platform GitHub to identify vulnerabilities in foundational repositories commonly used for modern AI development (Linux, BERT, PyTorch, and Transformers), as well as the AI repositories that utilize foundation repositories as dependencies. Using an unsupervised graph embedding approach, we generate graph embeddings that capture vulnerability information and the relationships between repositories. Based on these embeddings, we performed clustering as our downstream task to group similarly vulnerable repositories. Our research identifies patterns and similarities between repositories and will help develop effective mitigation of vulnerabilities present in groups of repositories based on foundational AI repositories. We also discuss the implications of identifying such clusters of vulnerable repositories.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信