Shuo Wang, Mahathir Almashor, A. Abuadbba, Ruoxi Sun, Minhui Xue, Calvin Wang, R. Gaire, Surya Nepal, S. Çamtepe
{"title":"DOITRUST:通过图学习剖析链上受损的互联网域名","authors":"Shuo Wang, Mahathir Almashor, A. Abuadbba, Ruoxi Sun, Minhui Xue, Calvin Wang, R. Gaire, Surya Nepal, S. Çamtepe","doi":"10.14722/ndss.2023.24322","DOIUrl":null,"url":null,"abstract":"—Traditional block/allow lists remain a significant defense against malicious websites, by limiting end-users’ access to domain names. However, such lists are often incomplete and reactive in nature. In this work, we first introduce an expansion graph which creates organically grown Internet domain allow-lists based on trust transitivity by crawling hyperlinks. Then, we highlight the gap of monitoring nodes with such an expansion graph, where malicious nodes are buried deep along the paths from the compromised websites, termed as “on-chain compromise”. The stealthiness (evasion of detection) and large-scale issues impede the application of existing web malicious analysis methods for identifying on-chain compromises within the sparsely labeled graph. To address the unique challenges of revealing the on-chain compromises, we propose a two-step integrated scheme, D O IT RUST , leveraging both individual node features and topology analysis: ( i ) we develop a semi-supervised suspicion prediction scheme to predict the probability of a node being relevant to targets of compromise ( i.e. , the denied nodes), including a novel node ranking approach as an efficient global propagation scheme to incorporate the topology information, and a scalable graph learning scheme to separate the global propagation from the training of the local prediction model, and ( ii ) based on the suspicion prediction results, efficient pruning strategies are proposed to further remove highly suspicious nodes from the crawled graph and analyze the underlying indicator of compromise. Experimental results show that D O IT RUST achieves 90% accuracy using less than 1% labeled nodes for the suspicion prediction, and its learning capability outperforms existing node-based and structure-based approaches. We also demonstrate that D O IT RUST is portable and practical. We manually review the detected compromised nodes, finding that at least 94.55% of them have suspicious content, and investigate the","PeriodicalId":199733,"journal":{"name":"Proceedings 2023 Network and Distributed System Security Symposium","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DOITRUST: Dissecting On-chain Compromised Internet Domains via Graph Learning\",\"authors\":\"Shuo Wang, Mahathir Almashor, A. Abuadbba, Ruoxi Sun, Minhui Xue, Calvin Wang, R. Gaire, Surya Nepal, S. Çamtepe\",\"doi\":\"10.14722/ndss.2023.24322\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"—Traditional block/allow lists remain a significant defense against malicious websites, by limiting end-users’ access to domain names. However, such lists are often incomplete and reactive in nature. In this work, we first introduce an expansion graph which creates organically grown Internet domain allow-lists based on trust transitivity by crawling hyperlinks. Then, we highlight the gap of monitoring nodes with such an expansion graph, where malicious nodes are buried deep along the paths from the compromised websites, termed as “on-chain compromise”. The stealthiness (evasion of detection) and large-scale issues impede the application of existing web malicious analysis methods for identifying on-chain compromises within the sparsely labeled graph. To address the unique challenges of revealing the on-chain compromises, we propose a two-step integrated scheme, D O IT RUST , leveraging both individual node features and topology analysis: ( i ) we develop a semi-supervised suspicion prediction scheme to predict the probability of a node being relevant to targets of compromise ( i.e. , the denied nodes), including a novel node ranking approach as an efficient global propagation scheme to incorporate the topology information, and a scalable graph learning scheme to separate the global propagation from the training of the local prediction model, and ( ii ) based on the suspicion prediction results, efficient pruning strategies are proposed to further remove highly suspicious nodes from the crawled graph and analyze the underlying indicator of compromise. Experimental results show that D O IT RUST achieves 90% accuracy using less than 1% labeled nodes for the suspicion prediction, and its learning capability outperforms existing node-based and structure-based approaches. We also demonstrate that D O IT RUST is portable and practical. We manually review the detected compromised nodes, finding that at least 94.55% of them have suspicious content, and investigate the\",\"PeriodicalId\":199733,\"journal\":{\"name\":\"Proceedings 2023 Network and Distributed System Security Symposium\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 2023 Network and Distributed System Security Symposium\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14722/ndss.2023.24322\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 2023 Network and Distributed System Security Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14722/ndss.2023.24322","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
-传统的阻止/允许列表仍然是对恶意网站的重要防御,通过限制最终用户访问域名。然而,这样的清单往往是不完整的,本质上是被动的。在这项工作中,我们首先引入了一个扩展图,该图通过爬行超链接来创建基于信任传递性的有机增长的互联网域允许列表。然后,我们用这样的扩展图突出监控节点的缺口,其中恶意节点沿着被入侵网站的路径深埋,称为“链上入侵”。隐密性(逃避检测)和大规模问题阻碍了现有web恶意分析方法在稀疏标记图中识别链上妥协的应用。为了解决揭示链上妥协的独特挑战,我们提出了一个两步集成方案,D O IT RUST,同时利用单个节点特征和拓扑分析:(I)我们开发了一种半监督怀疑预测方案来预测节点与妥协目标(即被拒绝的节点)相关的概率,其中包括一种新的节点排序方法作为一种有效的全局传播方案来结合拓扑信息,以及一种可扩展的图学习方案来将全局传播与局部预测模型的训练分离;提出了有效的剪枝策略,进一步从爬行图中去除高度可疑的节点,并分析妥协的潜在指标。实验结果表明,使用不到1%的标记节点进行怀疑预测,准确率达到90%,其学习能力优于现有的基于节点和基于结构的方法。我们还证明了D O IT RUST具有便携性和实用性。我们手动审查检测到的受损节点,发现其中至少94.55%存在可疑内容,并对其进行调查
DOITRUST: Dissecting On-chain Compromised Internet Domains via Graph Learning
—Traditional block/allow lists remain a significant defense against malicious websites, by limiting end-users’ access to domain names. However, such lists are often incomplete and reactive in nature. In this work, we first introduce an expansion graph which creates organically grown Internet domain allow-lists based on trust transitivity by crawling hyperlinks. Then, we highlight the gap of monitoring nodes with such an expansion graph, where malicious nodes are buried deep along the paths from the compromised websites, termed as “on-chain compromise”. The stealthiness (evasion of detection) and large-scale issues impede the application of existing web malicious analysis methods for identifying on-chain compromises within the sparsely labeled graph. To address the unique challenges of revealing the on-chain compromises, we propose a two-step integrated scheme, D O IT RUST , leveraging both individual node features and topology analysis: ( i ) we develop a semi-supervised suspicion prediction scheme to predict the probability of a node being relevant to targets of compromise ( i.e. , the denied nodes), including a novel node ranking approach as an efficient global propagation scheme to incorporate the topology information, and a scalable graph learning scheme to separate the global propagation from the training of the local prediction model, and ( ii ) based on the suspicion prediction results, efficient pruning strategies are proposed to further remove highly suspicious nodes from the crawled graph and analyze the underlying indicator of compromise. Experimental results show that D O IT RUST achieves 90% accuracy using less than 1% labeled nodes for the suspicion prediction, and its learning capability outperforms existing node-based and structure-based approaches. We also demonstrate that D O IT RUST is portable and practical. We manually review the detected compromised nodes, finding that at least 94.55% of them have suspicious content, and investigate the