Anes Abdennebi, Alp Tunçay, Cemal Yilmaz, Anil Koyuncu, Oktay Gungor
{"title":"LSTM-AE for Anomaly Detection on Multivariate Telemetry Data","authors":"Anes Abdennebi, Alp Tunçay, Cemal Yilmaz, Anil Koyuncu, Oktay Gungor","doi":"10.1109/SERA57763.2023.10197673","DOIUrl":null,"url":null,"abstract":"Organizations and companies that collect data generated by sales, transactions, client/server communications, IoT nodes, devices, engines, or any other data generating/exchanging source, need to analyze this data to reveal insights about the running activities on their systems. Since streaming data has multivariate variables bearing dependencies among each other that extend temporally (to previous time steps).Long-Short Term Memory (LSTM) is a variant of the Recurrent Neural Networks capable of learning long-term dependencies using previous timesteps of sequence-shape data. The LSTM model is a valid option to apply to our data for offline anomaly detection and help foresee future system incidents. Anything that negatively affects the system and the services provided via this system is considered an incident.Moreover, the raw input data might be noisy and improper for the model, leading to misleading predictions. A wiser choice is to use an LSTM Autoencoder (LSTM-AE) specialized for extracting meaningful features of the examined data and looking back several steps to preserve temporal dependencies.In our work, we developed two LSTM-AE models. We evaluated them in an industrial setup at Koçfinans (a finance company operating in Turkey), where they have a distributed system of several nodes running dozens of microservices. The outcome of this study shows that our trained LSTM-AE models succeeded in identifying the atypical behavior of offline data with high accuracies. Furthermore, after deploying the models, we identified the system failing at the exact times for the previous two reported failures. While after deployment, it launched cautions preceding the actual failure by a week, proving efficiency on online data. Our models achieved 99.7% accuracy and 89.1% as F1-score. Moreover, it shows potential in finding the proper LSTM-AE model architecture when time series data with temporal dependency property is fed to the model.","PeriodicalId":211080,"journal":{"name":"2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SERA57763.2023.10197673","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Organizations and companies that collect data generated by sales, transactions, client/server communications, IoT nodes, devices, engines, or any other data generating/exchanging source, need to analyze this data to reveal insights about the running activities on their systems. Since streaming data has multivariate variables bearing dependencies among each other that extend temporally (to previous time steps).Long-Short Term Memory (LSTM) is a variant of the Recurrent Neural Networks capable of learning long-term dependencies using previous timesteps of sequence-shape data. The LSTM model is a valid option to apply to our data for offline anomaly detection and help foresee future system incidents. Anything that negatively affects the system and the services provided via this system is considered an incident.Moreover, the raw input data might be noisy and improper for the model, leading to misleading predictions. A wiser choice is to use an LSTM Autoencoder (LSTM-AE) specialized for extracting meaningful features of the examined data and looking back several steps to preserve temporal dependencies.In our work, we developed two LSTM-AE models. We evaluated them in an industrial setup at Koçfinans (a finance company operating in Turkey), where they have a distributed system of several nodes running dozens of microservices. The outcome of this study shows that our trained LSTM-AE models succeeded in identifying the atypical behavior of offline data with high accuracies. Furthermore, after deploying the models, we identified the system failing at the exact times for the previous two reported failures. While after deployment, it launched cautions preceding the actual failure by a week, proving efficiency on online data. Our models achieved 99.7% accuracy and 89.1% as F1-score. Moreover, it shows potential in finding the proper LSTM-AE model architecture when time series data with temporal dependency property is fed to the model.