{"title":"系统软件中多模态异常检测的跨模态注意网络","authors":"Suchuan Xing;Yihan Wang","doi":"10.1109/OJCS.2025.3607975","DOIUrl":null,"url":null,"abstract":"Anomaly detection in system software traditionally relies on single-modal algorithms that analyze either discrete log events or continuous performance metrics in isolation, potentially missing complex anomalies that manifest across both modalities. We present a novel deep learning framework that leverages cross-modal attention mechanisms to jointly model log sequences and performance metrics for enhanced anomaly detection. Our method proposes Long Short-Term Memory (LSTM) networks to capture temporal dependencies in log event sequences and Temporal Convolutional Networks (TCNs) to model performance metric time series. The core innovation lies in our Cross-Modal Attention Mechanism, which dynamically weighs log events and metric features based on inter-modal relationships, enabling the detection of subtle anomalies that require contextual information from both data sources. Unlike conventional multi-modal fusion techniques that merely concatenate features, our attention mechanism explicitly models the dependencies between log patterns and metric behaviors, allowing the network to focus on relevant log events during metric anomalies and vice versa. We conduct comprehensive experiments on public datasets including HDFS and BGL logs paired with cloud computing performance metrics, as well as real-world cloud environments. Our method achieves significant improvements over single-modal baselines, with F1-scores increasing by 12.3% on average across datasets. Ablation studies confirm the effectiveness of the cross-modal attention mechanism, while real-time deployment experiments using Apache Flink demonstrate practical applicability with sub-second latency. The proposed framework addresses a critical gap in system software monitoring by providing a principled approach to multi-modal anomaly detection that scales to enterprise-level deployments.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"6 ","pages":"1463-1474"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11153984","citationCount":"0","resultStr":"{\"title\":\"Cross-Modal Attention Networks for Multi-Modal Anomaly Detection in System Software\",\"authors\":\"Suchuan Xing;Yihan Wang\",\"doi\":\"10.1109/OJCS.2025.3607975\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Anomaly detection in system software traditionally relies on single-modal algorithms that analyze either discrete log events or continuous performance metrics in isolation, potentially missing complex anomalies that manifest across both modalities. We present a novel deep learning framework that leverages cross-modal attention mechanisms to jointly model log sequences and performance metrics for enhanced anomaly detection. Our method proposes Long Short-Term Memory (LSTM) networks to capture temporal dependencies in log event sequences and Temporal Convolutional Networks (TCNs) to model performance metric time series. The core innovation lies in our Cross-Modal Attention Mechanism, which dynamically weighs log events and metric features based on inter-modal relationships, enabling the detection of subtle anomalies that require contextual information from both data sources. Unlike conventional multi-modal fusion techniques that merely concatenate features, our attention mechanism explicitly models the dependencies between log patterns and metric behaviors, allowing the network to focus on relevant log events during metric anomalies and vice versa. We conduct comprehensive experiments on public datasets including HDFS and BGL logs paired with cloud computing performance metrics, as well as real-world cloud environments. Our method achieves significant improvements over single-modal baselines, with F1-scores increasing by 12.3% on average across datasets. Ablation studies confirm the effectiveness of the cross-modal attention mechanism, while real-time deployment experiments using Apache Flink demonstrate practical applicability with sub-second latency. The proposed framework addresses a critical gap in system software monitoring by providing a principled approach to multi-modal anomaly detection that scales to enterprise-level deployments.\",\"PeriodicalId\":13205,\"journal\":{\"name\":\"IEEE Open Journal of the Computer Society\",\"volume\":\"6 \",\"pages\":\"1463-1474\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11153984\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Open Journal of the Computer Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11153984/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Computer Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11153984/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Cross-Modal Attention Networks for Multi-Modal Anomaly Detection in System Software
Anomaly detection in system software traditionally relies on single-modal algorithms that analyze either discrete log events or continuous performance metrics in isolation, potentially missing complex anomalies that manifest across both modalities. We present a novel deep learning framework that leverages cross-modal attention mechanisms to jointly model log sequences and performance metrics for enhanced anomaly detection. Our method proposes Long Short-Term Memory (LSTM) networks to capture temporal dependencies in log event sequences and Temporal Convolutional Networks (TCNs) to model performance metric time series. The core innovation lies in our Cross-Modal Attention Mechanism, which dynamically weighs log events and metric features based on inter-modal relationships, enabling the detection of subtle anomalies that require contextual information from both data sources. Unlike conventional multi-modal fusion techniques that merely concatenate features, our attention mechanism explicitly models the dependencies between log patterns and metric behaviors, allowing the network to focus on relevant log events during metric anomalies and vice versa. We conduct comprehensive experiments on public datasets including HDFS and BGL logs paired with cloud computing performance metrics, as well as real-world cloud environments. Our method achieves significant improvements over single-modal baselines, with F1-scores increasing by 12.3% on average across datasets. Ablation studies confirm the effectiveness of the cross-modal attention mechanism, while real-time deployment experiments using Apache Flink demonstrate practical applicability with sub-second latency. The proposed framework addresses a critical gap in system software monitoring by providing a principled approach to multi-modal anomaly detection that scales to enterprise-level deployments.