Miguel De la Cruz Cabello , Tiago Prince Sales , Marcos R. Machado
{"title":"Log anomaly detection in AIOps: A real-world implementation using Large Language Models","authors":"Miguel De la Cruz Cabello , Tiago Prince Sales , Marcos R. Machado","doi":"10.1016/j.sasc.2026.200475","DOIUrl":null,"url":null,"abstract":"<div><div>This study investigates the application of Large Language Models (LLMs) for log anomaly detection within the emerging field of AIOps, where large-scale operational logs are increasingly used to support reliability engineering and automated incident response. However, deploying LLM-based anomaly detection in military environments raises practical constraints, including strict data confidentiality, limited data sharing, and frequent shifts in operational conditions and log formats. To address these challenges, we design and implement a self-supervised anomaly detection framework based on LogBERT, trained only on normal Linux syslog sequences, and deploy it locally to avoid external dependencies. We explore critical parameters, including the minimum number of tokens per log sequence, sliding window intervals, and mask ratios while attempting to detect log anomaly. In controlled experiments, a 15-second sliding window with a 10-second overlap provided the best trade-off between detection effectiveness and inference latency, supporting real-time monitoring requirements. Quantitative evaluation demonstrates that shorter sliding windows, despite capturing less context, resulted in slightly higher detection performance of anomalous logs. The model achieved high accuracy in distinguishing normal from abnormal log sequences, where sequences were classified as anomalous if more than 10% of masked tokens were incorrectly predicted. A qualitative assessment with domain experts further validated the operational usefulness of the approach, indicating reduced manual monitoring effort and suitability for integration into AIOps pipelines under confidentiality constraints.</div></div>","PeriodicalId":101205,"journal":{"name":"Systems and Soft Computing","volume":"8 ","pages":"Article 200475"},"PeriodicalIF":3.6000,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systems and Soft Computing","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772941926000384","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/3/5 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This study investigates the application of Large Language Models (LLMs) for log anomaly detection within the emerging field of AIOps, where large-scale operational logs are increasingly used to support reliability engineering and automated incident response. However, deploying LLM-based anomaly detection in military environments raises practical constraints, including strict data confidentiality, limited data sharing, and frequent shifts in operational conditions and log formats. To address these challenges, we design and implement a self-supervised anomaly detection framework based on LogBERT, trained only on normal Linux syslog sequences, and deploy it locally to avoid external dependencies. We explore critical parameters, including the minimum number of tokens per log sequence, sliding window intervals, and mask ratios while attempting to detect log anomaly. In controlled experiments, a 15-second sliding window with a 10-second overlap provided the best trade-off between detection effectiveness and inference latency, supporting real-time monitoring requirements. Quantitative evaluation demonstrates that shorter sliding windows, despite capturing less context, resulted in slightly higher detection performance of anomalous logs. The model achieved high accuracy in distinguishing normal from abnormal log sequences, where sequences were classified as anomalous if more than 10% of masked tokens were incorrectly predicted. A qualitative assessment with domain experts further validated the operational usefulness of the approach, indicating reduced manual monitoring effort and suitability for integration into AIOps pipelines under confidentiality constraints.