An Effective Approach for Parsing Large Log Files

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME) Pub Date : 2022-10-01 DOI:10.1109/ICSME55016.2022.00009

Issam Sedki, A. Hamou-Lhadj, O. Mohamed, M. Shehab

{"title":"An Effective Approach for Parsing Large Log Files","authors":"Issam Sedki, A. Hamou-Lhadj, O. Mohamed, M. Shehab","doi":"10.1109/ICSME55016.2022.00009","DOIUrl":null,"url":null,"abstract":"Because of their contribution to the overall reliability assurance process, software logs have become important data assets for the analysis of software systems. Logs are often the only data points that can shed light on how a software system behaves once deployed. Unfortunately, logs are often unstructured data items, hindering viable analysis of their content. There are studies that aim to automatically parse large log files. The primary goal is to create templates from raw log data samples that can later be used to recognize future logs. In this paper, we propose ULP, a Unified Log Parsing tool, which is highly accurate and efficient. ULP combines string matching and local frequency analysis to parse large log files in an efficient manner. First, log events are organized into groups using a text processing method. Frequency analysis is then applied locally to instances of the same group to identify static and dynamic content of log events. When applied to 10 log datasets of the LogPai benchmark, ULP achieves an average accuracy of 89.2%, which outperforms the accuracy of four leading log parsing tools, namely Drain, Logram, SPELL and AEL. Additionally, ULP can parse up to four million log events in less than 3 minutes. ULP is available online as an open source and can be readily used by practitioners and researchers to parse effectively and efficiently large log files so as to support log analysis tasks.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSME55016.2022.00009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Because of their contribution to the overall reliability assurance process, software logs have become important data assets for the analysis of software systems. Logs are often the only data points that can shed light on how a software system behaves once deployed. Unfortunately, logs are often unstructured data items, hindering viable analysis of their content. There are studies that aim to automatically parse large log files. The primary goal is to create templates from raw log data samples that can later be used to recognize future logs. In this paper, we propose ULP, a Unified Log Parsing tool, which is highly accurate and efficient. ULP combines string matching and local frequency analysis to parse large log files in an efficient manner. First, log events are organized into groups using a text processing method. Frequency analysis is then applied locally to instances of the same group to identify static and dynamic content of log events. When applied to 10 log datasets of the LogPai benchmark, ULP achieves an average accuracy of 89.2%, which outperforms the accuracy of four leading log parsing tools, namely Drain, Logram, SPELL and AEL. Additionally, ULP can parse up to four million log events in less than 3 minutes. ULP is available online as an open source and can be readily used by practitioners and researchers to parse effectively and efficiently large log files so as to support log analysis tasks.

查看原文本刊更多论文

解析大型日志文件的有效方法

由于它们对整个可靠性保证过程的贡献，软件日志已成为软件系统分析的重要数据资产。日志通常是唯一能够揭示软件系统在部署后如何运行的数据点。不幸的是，日志通常是非结构化的数据项，阻碍了对其内容的可行分析。有一些研究旨在自动解析大型日志文件。主要目标是从原始日志数据示例创建模板，这些模板以后可用于识别未来的日志。在本文中，我们提出了一个统一的日志解析工具ULP，它具有很高的准确性和效率。ULP将字符串匹配和本地频率分析相结合，以高效的方式解析大型日志文件。首先，使用文本处理方法将日志事件组织成组。然后将频率分析本地应用于同一组的实例，以识别日志事件的静态和动态内容。当应用于LogPai基准的10个日志数据集时，ULP的平均准确率达到89.2%，优于四种领先的日志解析工具(Drain, Logram, SPELL和AEL)的准确率。此外，ULP可以在不到3分钟的时间内解析多达400万个日志事件。ULP作为开放源代码在线提供，从业者和研究人员可以很容易地使用它来有效和高效地解析大型日志文件，从而支持日志分析任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

自引率

0.00%

发文量