Linnaeus: A highly reusable and adaptable ML based log classification pipeline

Armin Catovic, Carolyn Cartwright, Yasmin Tesfaldet Gebreyesus, Simone Ferlin Oliveira
{"title":"Linnaeus: A highly reusable and adaptable ML based log classification pipeline","authors":"Armin Catovic, Carolyn Cartwright, Yasmin Tesfaldet Gebreyesus, Simone Ferlin Oliveira","doi":"10.1109/WAIN52551.2021.00008","DOIUrl":null,"url":null,"abstract":"Logs are a common way to record detailed run-time information in software. As modern software systems evolve in scale and complexity, logs have become indispensable to understanding the internal states of the system. At the same time however, manually inspecting logs has become impractical. In recent times, there has been more emphasis on statistical and machine learning (ML) based methods for analyzing logs. While the results have shown promise, most of the literature focuses on algorithms and state-of-the-art (SOTA), while largely ignoring the practical aspects. In this paper we demonstrate our end-to-end log classification pipeline, Linnaeus. Besides showing the more traditional ML flow, we also demonstrate our solutions for adaptability and re-use, integration towards large scale software development processes, and how we cope with lack of labelled data. We hope Linnaeus can serve as a blueprint for, and inspire the integration of, various ML based solutions in other large scale industrial settings.","PeriodicalId":224912,"journal":{"name":"2021 IEEE/ACM 1st Workshop on AI Engineering - Software Engineering for AI (WAIN)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM 1st Workshop on AI Engineering - Software Engineering for AI (WAIN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WAIN52551.2021.00008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Logs are a common way to record detailed run-time information in software. As modern software systems evolve in scale and complexity, logs have become indispensable to understanding the internal states of the system. At the same time however, manually inspecting logs has become impractical. In recent times, there has been more emphasis on statistical and machine learning (ML) based methods for analyzing logs. While the results have shown promise, most of the literature focuses on algorithms and state-of-the-art (SOTA), while largely ignoring the practical aspects. In this paper we demonstrate our end-to-end log classification pipeline, Linnaeus. Besides showing the more traditional ML flow, we also demonstrate our solutions for adaptability and re-use, integration towards large scale software development processes, and how we cope with lack of labelled data. We hope Linnaeus can serve as a blueprint for, and inspire the integration of, various ML based solutions in other large scale industrial settings.
Linnaeus:一个高度可重用和适应性强的基于ML的日志分类管道
日志是记录软件运行时详细信息的常用方法。随着现代软件系统在规模和复杂性上的发展,日志对于理解系统的内部状态已经变得不可或缺。但与此同时,手工检查日志已经变得不切实际。近年来,人们越来越重视基于统计和机器学习(ML)的日志分析方法。虽然结果显示出希望,但大多数文献都集中在算法和最先进的技术(SOTA)上,而在很大程度上忽略了实际方面。在本文中,我们展示了我们的端到端日志分类管道,Linnaeus。除了展示更传统的机器学习流,我们还展示了我们的解决方案的适应性和重用,集成到大规模的软件开发过程,以及我们如何应对缺乏标记数据。我们希望Linnaeus可以作为蓝图,并激发其他大规模工业环境中各种基于ML的解决方案的集成。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信