Xiaoyan Hu , Di Li , Miao Li , Guang Cheng , Ruidong Li , Hua Wu
{"title":"AHDom: Algorithmically generated domain detection using attribute heterogeneous graph neural network","authors":"Xiaoyan Hu , Di Li , Miao Li , Guang Cheng , Ruidong Li , Hua Wu","doi":"10.1016/j.comnet.2024.110770","DOIUrl":null,"url":null,"abstract":"<div><p>Many cyber-attacks use Algorithmically Generated Domain (AGD) names to establish connections with command and control servers for subsequent attack behaviors. Identifying and blocking such AGDs helps detect and prevent attacks quickly. Traditional machine or deep learning detection methods rely only on individual domain features and face challenges in accurately distinguishing AGDs that attackers have crafted to evade detection. Thus, researchers leverage the inherent associated features among domains, clients, and resolved IP addresses to detect AGDs. In such research, heterogeneous graph neural networks are extensively employed. However, most existing methods rely on associated features, leading to inaccurate detection of isolated domain nodes. Besides, most existing detection methods employ transductive learning and are time-consuming. This paper proposes an AGD detection method, AHDom, to address these challenges. AHDom models DNS traffic as a Heterogeneous Information Network (HIN) to capture the intricate relationships between domains, clients, and resolved IP addresses. Besides, it extracts character and behavior features as initial attributes of domains to obtain an Attribute HIN (AHIN), enhancing the detection accuracy of isolated domain nodes. Based on the AHIN, it combines meta-path random walk, the inductive learning algorithm GraphSAGE, and the attention mechanism to obtain effective embedding representations of domain nodes. Ultimately, it achieves domain classification based on embedding representations of domain nodes. Our experimental results demonstrate that AHDom is superior to state-of-the-art methods in the performance and efficiency of detecting AGDs. AHDom achieves an average accuracy of 98.74% on our constructed dataset and costs only about 30.23% of the existing best graph neural network approach in the testing time.</p></div>","PeriodicalId":50637,"journal":{"name":"Computer Networks","volume":null,"pages":null},"PeriodicalIF":4.4000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1389128624006029","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Many cyber-attacks use Algorithmically Generated Domain (AGD) names to establish connections with command and control servers for subsequent attack behaviors. Identifying and blocking such AGDs helps detect and prevent attacks quickly. Traditional machine or deep learning detection methods rely only on individual domain features and face challenges in accurately distinguishing AGDs that attackers have crafted to evade detection. Thus, researchers leverage the inherent associated features among domains, clients, and resolved IP addresses to detect AGDs. In such research, heterogeneous graph neural networks are extensively employed. However, most existing methods rely on associated features, leading to inaccurate detection of isolated domain nodes. Besides, most existing detection methods employ transductive learning and are time-consuming. This paper proposes an AGD detection method, AHDom, to address these challenges. AHDom models DNS traffic as a Heterogeneous Information Network (HIN) to capture the intricate relationships between domains, clients, and resolved IP addresses. Besides, it extracts character and behavior features as initial attributes of domains to obtain an Attribute HIN (AHIN), enhancing the detection accuracy of isolated domain nodes. Based on the AHIN, it combines meta-path random walk, the inductive learning algorithm GraphSAGE, and the attention mechanism to obtain effective embedding representations of domain nodes. Ultimately, it achieves domain classification based on embedding representations of domain nodes. Our experimental results demonstrate that AHDom is superior to state-of-the-art methods in the performance and efficiency of detecting AGDs. AHDom achieves an average accuracy of 98.74% on our constructed dataset and costs only about 30.23% of the existing best graph neural network approach in the testing time.
期刊介绍:
Computer Networks is an international, archival journal providing a publication vehicle for complete coverage of all topics of interest to those involved in the computer communications networking area. The audience includes researchers, managers and operators of networks as well as designers and implementors. The Editorial Board will consider any material for publication that is of interest to those groups.