Drain: An Online Log Parsing Approach with Fixed Depth Tree

Pinjia He, Jieming Zhu, Zibin Zheng, Michael R. Lyu
{"title":"Drain: An Online Log Parsing Approach with Fixed Depth Tree","authors":"Pinjia He, Jieming Zhu, Zibin Zheng, Michael R. Lyu","doi":"10.1109/ICWS.2017.13","DOIUrl":null,"url":null,"abstract":"Logs, which record valuable system runtime information, have been widely employed in Web service management by service providers and users. A typical log analysis based Web service management procedure is to first parse raw log messages because of their unstructured format, and then apply data mining models to extract critical system behavior information, which can assist Web service management. Most of the existing log parsing methods focus on offline, batch processing of logs. However, as the volume of logs increases rapidly, model training of offline log parsing methods, which employs all existing logs after log collection, becomes time consuming. To address this problem, we propose an online log parsing method, namely Drain, that can parse logs in a streaming and timely manner. To accelerate the parsing process, Drain uses a fixed depth parse tree, which encodes specially designed rules for parsing. We evaluate Drain on five real-world log data sets with more than 10 million raw log messages. The experimental results show that Drain has the highest accuracy on four data sets, and comparable accuracy on the remaining one. Besides, Drain obtains 51.85%~81.47% improvement in running time compared with the state-of-the-art online parser. We also conduct a case study on an anomaly detection task using Drain in the parsing step, which determines the effectiveness of Drain in log analysis.","PeriodicalId":235426,"journal":{"name":"2017 IEEE International Conference on Web Services (ICWS)","volume":"129 33","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"380","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Web Services (ICWS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWS.2017.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 380

Abstract

Logs, which record valuable system runtime information, have been widely employed in Web service management by service providers and users. A typical log analysis based Web service management procedure is to first parse raw log messages because of their unstructured format, and then apply data mining models to extract critical system behavior information, which can assist Web service management. Most of the existing log parsing methods focus on offline, batch processing of logs. However, as the volume of logs increases rapidly, model training of offline log parsing methods, which employs all existing logs after log collection, becomes time consuming. To address this problem, we propose an online log parsing method, namely Drain, that can parse logs in a streaming and timely manner. To accelerate the parsing process, Drain uses a fixed depth parse tree, which encodes specially designed rules for parsing. We evaluate Drain on five real-world log data sets with more than 10 million raw log messages. The experimental results show that Drain has the highest accuracy on four data sets, and comparable accuracy on the remaining one. Besides, Drain obtains 51.85%~81.47% improvement in running time compared with the state-of-the-art online parser. We also conduct a case study on an anomaly detection task using Drain in the parsing step, which determines the effectiveness of Drain in log analysis.
一种固定深度树的在线日志解析方法
日志记录了有价值的系统运行时信息,已被服务提供者和用户广泛应用于Web服务管理。典型的基于日志分析的Web服务管理过程是首先解析原始日志消息,因为它们的格式是非结构化的,然后应用数据挖掘模型提取关键的系统行为信息,这些信息可以帮助Web服务管理。大多数现有的日志解析方法侧重于离线、批量处理日志。但是,随着日志量的快速增长,离线日志解析方法的模型训练变得非常耗时,因为离线日志解析方法在日志收集后会使用所有现有的日志。为了解决这个问题,我们提出了一种在线日志解析方法,即Drain,它可以实时地分析日志。为了加速解析过程,Drain使用了一个固定深度的解析树,该树对专门设计的解析规则进行编码。我们在五个真实日志数据集上评估了Drain,这些数据集包含超过1000万条原始日志消息。实验结果表明,该方法在4个数据集上具有最高的准确率,在其余数据集上具有相当的准确率。此外,与最先进的在线解析器相比,Drain在运行时间上提高了51.85%~81.47%。我们还对在解析步骤中使用Drain的异常检测任务进行了案例研究,从而确定了Drain在日志分析中的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信