{"title":"A novel host-based intrusion detection approach leveraging audit logs","authors":"Jiaqing Jiang, Hongyang Chu, Donghai Tian","doi":"10.1016/j.future.2025.107995","DOIUrl":null,"url":null,"abstract":"<div><div>Host-based intrusion detection systems (HIDS) struggle to detect advanced cyber attacks (e.g., APT, LoTL) due to their stealthy nature and reliance on either structural or semantic features alone. We hypothesize that integrating semantic audit log analysis with structural provenance graph learning improves detection accuracy and adaptability. To validate this, we propose MalSnif, a novel framework that (1) parses audit logs to construct provenance graphs enriched with process/event relationships, (2) simplifies graphs by pruning peripheral nodes while retaining critical attack trajectories, and (3) employs NLP techniques (word2vec, GRU, BiLSTM) to extract semantic features, combined with a graph convolutional network (GCN) for detection. Implemented using PyTorch and ETW, MalSnif addresses data imbalance via strategic downsampling during training. Evaluations show that our approach can effectively detect different kinds of cyber attacks and outperforms recent methods. In addition, our methods for simplifying process event sequences and provenance graphs also yield effective and explainable results.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"174 ","pages":"Article 107995"},"PeriodicalIF":6.2000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25002900","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Host-based intrusion detection systems (HIDS) struggle to detect advanced cyber attacks (e.g., APT, LoTL) due to their stealthy nature and reliance on either structural or semantic features alone. We hypothesize that integrating semantic audit log analysis with structural provenance graph learning improves detection accuracy and adaptability. To validate this, we propose MalSnif, a novel framework that (1) parses audit logs to construct provenance graphs enriched with process/event relationships, (2) simplifies graphs by pruning peripheral nodes while retaining critical attack trajectories, and (3) employs NLP techniques (word2vec, GRU, BiLSTM) to extract semantic features, combined with a graph convolutional network (GCN) for detection. Implemented using PyTorch and ETW, MalSnif addresses data imbalance via strategic downsampling during training. Evaluations show that our approach can effectively detect different kinds of cyber attacks and outperforms recent methods. In addition, our methods for simplifying process event sequences and provenance graphs also yield effective and explainable results.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.