GPT-2C: a parser for honeypot logs using large pre-trained language models

Febrian Setianto, Erion Tsani, Fatima Sadiq, Georgios Domalis, Dimitris Tsakalidis, Panos Kostakos
{"title":"GPT-2C: a parser for honeypot logs using large pre-trained language models","authors":"Febrian Setianto, Erion Tsani, Fatima Sadiq, Georgios Domalis, Dimitris Tsakalidis, Panos Kostakos","doi":"10.1145/3487351.3492723","DOIUrl":null,"url":null,"abstract":"Deception technologies like honeypots generate large volumes of log data, which include illegal Unix shell commands used by latent intruders. Several prior works have reported promising results in overcoming the weaknesses of network-level and program-level Intrusion Detection Systems (IDSs) by fussing network traffic with data from honeypots. However, because honeypots lack the plug-in infrastructure to enable real-time parsing of log outputs, it remains technically challenging to feed illegal Unix commands into downstream predictive analytics. As a result, advances on honeypot-based user-level IDSs remain greatly hindered. This article presents a run-time system (GPT-2C) that leverages a large pre-trained language model (GPT-2) to parse dynamic logs generated by a live Cowrie SSH honeypot instance. After fine-tuning the GPT-2 model on an existing corpus of illegal Unix commands, the model achieved 89% inference accuracy in parsing Unix commands with acceptable execution latency.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3487351.3492723","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

Deception technologies like honeypots generate large volumes of log data, which include illegal Unix shell commands used by latent intruders. Several prior works have reported promising results in overcoming the weaknesses of network-level and program-level Intrusion Detection Systems (IDSs) by fussing network traffic with data from honeypots. However, because honeypots lack the plug-in infrastructure to enable real-time parsing of log outputs, it remains technically challenging to feed illegal Unix commands into downstream predictive analytics. As a result, advances on honeypot-based user-level IDSs remain greatly hindered. This article presents a run-time system (GPT-2C) that leverages a large pre-trained language model (GPT-2) to parse dynamic logs generated by a live Cowrie SSH honeypot instance. After fine-tuning the GPT-2 model on an existing corpus of illegal Unix commands, the model achieved 89% inference accuracy in parsing Unix commands with acceptable execution latency.
GPT-2C:使用大型预训练语言模型的蜜罐日志解析器
蜜罐之类的欺骗技术会生成大量日志数据,其中包括潜在入侵者使用的非法Unix shell命令。一些先前的工作已经报告了克服网络级和程序级入侵检测系统(ids)的弱点的有希望的结果,通过混淆来自蜜罐的数据的网络流量。但是,由于蜜罐缺乏支持日志输出实时解析的插件基础设施,因此将非法Unix命令提供给下游预测分析仍然具有技术挑战性。因此,基于蜜罐的用户级入侵防御系统的进展仍然受到很大阻碍。本文介绍了一个运行时系统(GPT-2C),它利用一个大型预训练语言模型(GPT-2)来解析由一个实时的Cowrie SSH蜜罐实例生成的动态日志。在现有的非法Unix命令语料库上对GPT-2模型进行微调后,该模型在解析Unix命令时达到了89%的推理准确率,并且执行延迟是可以接受的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信