{"title":"A Real-Time P2P Bot Host Detection in a Large-Scale Network Using Statistical Network Traffic Features and Apache Spark Streaming Platform","authors":"S. Saravanan, G. Prakash, B. Uma Maheswari","doi":"10.1109/I2CT57861.2023.10126429","DOIUrl":null,"url":null,"abstract":"Nowadays, Peer-to-Peer (P2P) bots play a significant role in launching attacks such as phishing, distributed denial-of-service (DDoS), email spam, click fraud, cryptocurrency mining, etc. The analysis of statistical network traffic features of hosts is one of the commonly used methods to detect P2P bots. Modern P2P bot detection systems need to extract features from massive streaming network traffic as the size of the Internet keeps increasing every day. However, traditional detection systems have trouble detecting bots in real-time in large-scale networks as they are not implemented on big data streaming platforms. Hence, this work proposes a network flow-based P2P bot detection system implemented on Apache Spark Structured Streaming Platform to detect P2P bots in real time by analyzing massive streaming network traffic data generated from large-scale networks. Such detection of P2P bots is based on statistical network traffic features: destination diversity ratio, control packets ratio, and total source bytes sent in a flow. There are two components in the proposed system: the first component detects potential P2P hosts using the Destination Diversity Ratio (DDR), and the second component finds out P2P bot hosts from the P2P hosts identified by the first component. Furthermore, the performance of the detection components depends on the time window at which statistical features are extracted. Hence, this work also conducted experiments to study the effect of different time windows on detection components. The proposed system is evaluated using real-world datasets and achieves a True Positive Rate (TPR) of 99.87%.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/I2CT57861.2023.10126429","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Nowadays, Peer-to-Peer (P2P) bots play a significant role in launching attacks such as phishing, distributed denial-of-service (DDoS), email spam, click fraud, cryptocurrency mining, etc. The analysis of statistical network traffic features of hosts is one of the commonly used methods to detect P2P bots. Modern P2P bot detection systems need to extract features from massive streaming network traffic as the size of the Internet keeps increasing every day. However, traditional detection systems have trouble detecting bots in real-time in large-scale networks as they are not implemented on big data streaming platforms. Hence, this work proposes a network flow-based P2P bot detection system implemented on Apache Spark Structured Streaming Platform to detect P2P bots in real time by analyzing massive streaming network traffic data generated from large-scale networks. Such detection of P2P bots is based on statistical network traffic features: destination diversity ratio, control packets ratio, and total source bytes sent in a flow. There are two components in the proposed system: the first component detects potential P2P hosts using the Destination Diversity Ratio (DDR), and the second component finds out P2P bot hosts from the P2P hosts identified by the first component. Furthermore, the performance of the detection components depends on the time window at which statistical features are extracted. Hence, this work also conducted experiments to study the effect of different time windows on detection components. The proposed system is evaluated using real-world datasets and achieves a True Positive Rate (TPR) of 99.87%.