Development of an Algorithm for Extracting and Encoding Data from Log Messages of a Computing System for Anomaly Detection Systems

Q4 Materials Science
G. A. Drachev
{"title":"Development of an Algorithm for Extracting and Encoding Data from Log Messages of a Computing System for Anomaly Detection Systems","authors":"G. A. Drachev","doi":"10.17587/it.29.351-359","DOIUrl":null,"url":null,"abstract":"This article is devoted to development of an algorithm for automated analysis and transformation of a log message into a list of features in the form of a fixed-length vector and accumulation of the obtained vectors into a single dataset. The resulted dataset is proposed to be used in machine learning based anomaly detection systems. An additional requirement for the algorithm being developed is the diversity of protocols used to collect log messages in a computer system. These goals were achieved by develop of the software package. The software package collect and parse data from log messages in order to isolate and encode the features from log messages. The software package is enable to collect log messages by several protocols: syslog, SNMP, SQL, reading text and binary files. The data extracted from the log messages of the computing system is considered. The support of LUA scripts for data enrichment is applied. The list of features is generated. The method to encode text data extracted from log messages is proposed. The transformation algorithm of an arbitrary log message into a features vector of fixed dimension is proposed. A methodology for the formation of a dataset for subsequent use in machine learning of the anomaly detection system in a computing system is provided. An example of a dataset storage structure is given.","PeriodicalId":37476,"journal":{"name":"Radioelektronika, Nanosistemy, Informacionnye Tehnologii","volume":"38 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radioelektronika, Nanosistemy, Informacionnye Tehnologii","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17587/it.29.351-359","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Materials Science","Score":null,"Total":0}
引用次数: 0

Abstract

This article is devoted to development of an algorithm for automated analysis and transformation of a log message into a list of features in the form of a fixed-length vector and accumulation of the obtained vectors into a single dataset. The resulted dataset is proposed to be used in machine learning based anomaly detection systems. An additional requirement for the algorithm being developed is the diversity of protocols used to collect log messages in a computer system. These goals were achieved by develop of the software package. The software package collect and parse data from log messages in order to isolate and encode the features from log messages. The software package is enable to collect log messages by several protocols: syslog, SNMP, SQL, reading text and binary files. The data extracted from the log messages of the computing system is considered. The support of LUA scripts for data enrichment is applied. The list of features is generated. The method to encode text data extracted from log messages is proposed. The transformation algorithm of an arbitrary log message into a features vector of fixed dimension is proposed. A methodology for the formation of a dataset for subsequent use in machine learning of the anomaly detection system in a computing system is provided. An example of a dataset storage structure is given.
一种从异常检测系统计算系统日志消息中提取和编码数据的算法的发展
本文致力于开发一种算法,用于自动分析和将日志消息转换为固定长度向量形式的特征列表,并将获得的向量累加到单个数据集中。结果数据集被建议用于基于机器学习的异常检测系统。正在开发的算法的另一个要求是用于在计算机系统中收集日志消息的协议的多样性。这些目标都是通过软件包的开发实现的。该软件包从日志消息中收集和解析数据,以便从日志消息中分离和编码特征。该软件包支持syslog、SNMP、SQL、读取文本和二进制文件等多种协议收集日志信息。考虑从计算系统的日志消息中提取的数据。应用了LUA脚本对数据充实的支持。生成特性列表。提出了对日志信息中提取的文本数据进行编码的方法。提出了将任意日志信息转换为固定维数的特征向量的算法。提供了一种用于在计算系统中异常检测系统的机器学习中后续使用的数据集的形成方法。给出了一个数据集存储结构的实例。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Radioelektronika, Nanosistemy, Informacionnye Tehnologii
Radioelektronika, Nanosistemy, Informacionnye Tehnologii Materials Science-Materials Science (miscellaneous)
CiteScore
0.60
自引率
0.00%
发文量
38
期刊介绍: Journal “Radioelectronics. Nanosystems. Information Technologies” (abbr RENSIT) publishes original articles, reviews and brief reports, not previously published, on topical problems in radioelectronics (including biomedical) and fundamentals of information, nano- and biotechnologies and adjacent areas of physics and mathematics. The authors of the journal are academicians, corresponding members and foreign members of the Russian Academy of Natural Sciences (RANS) and their colleagues, as well as other russian and foreign authors on the proposal of the members of RANS, which can be obtained by the author before sending articles to the editor or after its arrival on the recommendation of a member of the editorial board or another member of the RANS, who gave the opinion on the article at the request of the editior. The editors will accept articles in both Russian and English languages. Articles are internally peer reviewed (double-blind peer review) by members of the Editorial Board. Some articles undergo external review, if necessary. Designed for researchers, graduate students, physics students of senior courses and teachers. It turns out 2 times a year (that includes 2 rooms)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信