Issam Sedki, A. Hamou-Lhadj, O. Mohamed, Naser Ezzati-Jivan
{"title":"Towards a Classification of Log Parsing Errors","authors":"Issam Sedki, A. Hamou-Lhadj, O. Mohamed, Naser Ezzati-Jivan","doi":"10.1109/ICPC58990.2023.00023","DOIUrl":null,"url":null,"abstract":"Log parsing is used to extract structures from unstructured log data. It is a key enabler for many software engineering tasks including debugging, fault diagnosis, and anomaly detection. In recent years, we have seen an increase in the number of log parsing techniques and tools. The accuracy of these tools varies significantly. To improve log parsing tools, we need to understand the type of parsing errors they make, which is the purpose of this early research track paper. We achieve this by examining errors of four leading log parsing tools when applied to the parsing of four log datasets generated from various systems. Based on this analysis, we suggest a preliminary classification of log parsing errors, which contains nine categories of errors. We believe that this classification is a good starting point for improving the accuracy of log parsing tools, and also defining better logging practices.","PeriodicalId":376593,"journal":{"name":"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)","volume":"529 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPC58990.2023.00023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Log parsing is used to extract structures from unstructured log data. It is a key enabler for many software engineering tasks including debugging, fault diagnosis, and anomaly detection. In recent years, we have seen an increase in the number of log parsing techniques and tools. The accuracy of these tools varies significantly. To improve log parsing tools, we need to understand the type of parsing errors they make, which is the purpose of this early research track paper. We achieve this by examining errors of four leading log parsing tools when applied to the parsing of four log datasets generated from various systems. Based on this analysis, we suggest a preliminary classification of log parsing errors, which contains nine categories of errors. We believe that this classification is a good starting point for improving the accuracy of log parsing tools, and also defining better logging practices.