GenLog: Accurate Log Template Discovery for Stripped X86 Binaries

Maosheng Zhang, Ying Zhao, Zengmingyu He
{"title":"GenLog: Accurate Log Template Discovery for Stripped X86 Binaries","authors":"Maosheng Zhang, Ying Zhao, Zengmingyu He","doi":"10.1109/COMPSAC.2017.137","DOIUrl":null,"url":null,"abstract":"Log analysis plays an important role for computer failure diagnosis. With the ever increasing size and complexity of logs, the task of analyzing logs has become cumbersome to carry out manually. For this reason, recent research has focused on automatic analysis techniques for large log files. However, log messages are texts with certain formats and it is very challenging for automatic analysis to understand the semantic meanings of log messages. The current state-of-the-art approaches depend on the quality of observed log messages or source code producing these log messages. In this paper, we propose a method GenLog that can extract log templates from stripped executables (neither source code nor debugging information need to be available). GenLog finds all log related functions in a binary through a combined bottom-up and top down slicing method, reconstructs the memory buffers where log messages were constructeStripped X86 Binaries d, and identifies components of log messages using data flow analysis and taint propagation analysis. GenLog can be used to analyze large binary code, and is suitable for commercial off-the-shelf (COTS) software or dynamic libraries. We evaluated GenLog on four X86 executables and one of them is Nginx. The experiments show that GenLog can identify the template for log messages in testing log files with a precision of 99.9%.","PeriodicalId":6556,"journal":{"name":"2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC)","volume":"24 1","pages":"337-346"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC.2017.137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Log analysis plays an important role for computer failure diagnosis. With the ever increasing size and complexity of logs, the task of analyzing logs has become cumbersome to carry out manually. For this reason, recent research has focused on automatic analysis techniques for large log files. However, log messages are texts with certain formats and it is very challenging for automatic analysis to understand the semantic meanings of log messages. The current state-of-the-art approaches depend on the quality of observed log messages or source code producing these log messages. In this paper, we propose a method GenLog that can extract log templates from stripped executables (neither source code nor debugging information need to be available). GenLog finds all log related functions in a binary through a combined bottom-up and top down slicing method, reconstructs the memory buffers where log messages were constructeStripped X86 Binaries d, and identifies components of log messages using data flow analysis and taint propagation analysis. GenLog can be used to analyze large binary code, and is suitable for commercial off-the-shelf (COTS) software or dynamic libraries. We evaluated GenLog on four X86 executables and one of them is Nginx. The experiments show that GenLog can identify the template for log messages in testing log files with a precision of 99.9%.
GenLog:准确的日志模板发现剥离X86二进制文件
日志分析在计算机故障诊断中起着重要的作用。随着日志的大小和复杂性的不断增加,手工分析日志的任务变得非常繁琐。由于这个原因,最近的研究集中在大型日志文件的自动分析技术上。但是,日志消息是具有特定格式的文本,对于自动分析来说,理解日志消息的语义意义是非常具有挑战性的。当前最先进的方法依赖于观察到的日志消息或生成这些日志消息的源代码的质量。在本文中,我们提出了一种方法GenLog,它可以从剥离的可执行文件中提取日志模板(既不需要源代码也不需要调试信息)。GenLog通过结合自底向上和自顶向下的切片方法,在二进制文件中找到所有与日志相关的函数,重建构建日志消息的内存缓冲区(剥离X86 Binaries d),并使用数据流分析和污染传播分析识别日志消息的组件。GenLog可用于分析大型二进制代码,适用于商用现货(COTS)软件或动态库。我们在四个X86可执行文件上评估了GenLog,其中一个是Nginx。实验表明,GenLog可以识别测试日志文件中日志消息的模板,准确率达到99.9%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信