Investigating and improving log parsing in practice

软件产业与工程 Pub Date : 2022-11-07 DOI:10.1145/3540250.3558947

Ying Fu, Meng Yan, Jian Xu, Jianguo Li, Zhongxin Liu, Xiaohong Zhang, Dan Yang

{"title":"Investigating and improving log parsing in practice","authors":"Ying Fu, Meng Yan, Jian Xu, Jianguo Li, Zhongxin Liu, Xiaohong Zhang, Dan Yang","doi":"10.1145/3540250.3558947","DOIUrl":null,"url":null,"abstract":"Logs are widely used for system behavior diagnosis by automatic log mining. Log parsing is an important data preprocessing step that converts semi-structured log messages into structured data as the feature input for log mining. Currently, many studies are devoted to proposing new log parsers. However, to the best of our knowledge, no previous study comprehensively investigates the effectiveness of log parsers in industrial practice. To investigate the effectiveness of the log parsers in industrial practice, in this paper, we conduct an empirical study on the effectiveness of six state-of-the-art log parsers on 10 microservice applications of Ant Group. Our empirical results highlight two challenges for log parsing in practice: 1) various separators. There are various separators in a log message, and the separators in different event templates or different applications are also various. Current log parsers cannot perform well because they do not consider various separators. 2) Various lengths due to nested objects. The log messages belonging to the same event template may also have various lengths due to nested objects. The log messages of 6 out of 10 microservice applications at Ant Group with various lengths due to nested objects. 4 out of 6 state-of-the-art log parsers cannot deal with various lengths due to nested objects. In this paper, we propose an improved log parser named Drain+ based on a state-of-the-art log parser Drain. Drain+ includes two innovative components to address the above two challenges: a statistical-based separators generation component, which generates separators automatically for log message splitting, and a candidate event template merging component, which merges the candidate event templates by a template similarity method. We evaluate the effectiveness of Drain+ on 10 microservice applications of Ant Group and 16 public datasets. The results show that Drain+ outperforms the six state-of-the-art log parsers on industrial applications and public datasets. Finally, we conclude the observations in the road ahead for log parsing to inspire other researchers and practitioners.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"134 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"软件产业与工程","FirstCategoryId":"1089","ListUrlMain":"https://doi.org/10.1145/3540250.3558947","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Logs are widely used for system behavior diagnosis by automatic log mining. Log parsing is an important data preprocessing step that converts semi-structured log messages into structured data as the feature input for log mining. Currently, many studies are devoted to proposing new log parsers. However, to the best of our knowledge, no previous study comprehensively investigates the effectiveness of log parsers in industrial practice. To investigate the effectiveness of the log parsers in industrial practice, in this paper, we conduct an empirical study on the effectiveness of six state-of-the-art log parsers on 10 microservice applications of Ant Group. Our empirical results highlight two challenges for log parsing in practice: 1) various separators. There are various separators in a log message, and the separators in different event templates or different applications are also various. Current log parsers cannot perform well because they do not consider various separators. 2) Various lengths due to nested objects. The log messages belonging to the same event template may also have various lengths due to nested objects. The log messages of 6 out of 10 microservice applications at Ant Group with various lengths due to nested objects. 4 out of 6 state-of-the-art log parsers cannot deal with various lengths due to nested objects. In this paper, we propose an improved log parser named Drain+ based on a state-of-the-art log parser Drain. Drain+ includes two innovative components to address the above two challenges: a statistical-based separators generation component, which generates separators automatically for log message splitting, and a candidate event template merging component, which merges the candidate event templates by a template similarity method. We evaluate the effectiveness of Drain+ on 10 microservice applications of Ant Group and 16 public datasets. The results show that Drain+ outperforms the six state-of-the-art log parsers on industrial applications and public datasets. Finally, we conclude the observations in the road ahead for log parsing to inspire other researchers and practitioners.

查看原文本刊更多论文

在实践中研究和改进日志解析

通过日志自动挖掘，日志被广泛用于系统行为诊断。日志解析是一个重要的数据预处理步骤，它将半结构化的日志消息转换为结构化数据，作为日志挖掘的特征输入。目前，许多研究都致力于提出新的日志解析器。然而，据我们所知，以前没有研究全面调查日志解析器在工业实践中的有效性。为了考察日志解析器在工业实践中的有效性，本文对蚂蚁集团10个微服务应用中6个最先进的日志解析器的有效性进行了实证研究。我们的实证结果突出了实践中日志解析的两个挑战:1)各种分隔符。日志消息中有各种分隔符，不同事件模板或不同应用程序中的分隔符也各不相同。当前的日志解析器不能很好地执行，因为它们没有考虑各种分隔符。2)由于嵌套对象的不同长度。由于嵌套对象的原因，属于同一事件模板的日志消息也可能具有不同的长度。蚂蚁集团10个微服务应用程序中有6个的日志消息由于嵌套对象而具有不同的长度。6个最先进的日志解析器中有4个无法处理由于嵌套对象而导致的各种长度。在本文中，我们提出了一个改进的日志解析器Drain+，它基于最先进的日志解析器Drain。Drain+包含两个创新的组件来解决上述两个挑战:一个基于统计的分隔符生成组件，它自动为日志消息拆分生成分隔符;一个候选事件模板合并组件，它通过模板相似度方法合并候选事件模板。我们在蚂蚁集团的10个微服务应用和16个公共数据集上评估了Drain+的有效性。结果表明，Drain+在工业应用程序和公共数据集上的性能优于六个最先进的日志解析器。最后，我们总结了日志解析在未来发展道路上的观察结果，以启发其他研究者和实践者。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

软件产业与工程

自引率

0.00%

发文量

676