研究报告:ICARUS:通过羽毛和蜡的方式理解事实格式

2020 IEEE Security and Privacy Workshops (SPW) Pub Date : 2020-05-01 DOI:10.1109/SPW50608.2020.00067

Sam Cowger, Yerim Lee, Nichole Schimanski, Mark Tullsen, Walter Woods, Richard Jones, E. W. Davis, William Harris, Trent Brunson, Carson Harmon, Bradford Larsen, E. Sultanik

{"title":"研究报告:ICARUS:通过羽毛和蜡的方式理解事实格式","authors":"Sam Cowger, Yerim Lee, Nichole Schimanski, Mark Tullsen, Walter Woods, Richard Jones, E. W. Davis, William Harris, Trent Brunson, Carson Harmon, Bradford Larsen, E. Sultanik","doi":"10.1109/SPW50608.2020.00067","DOIUrl":null,"url":null,"abstract":"When $a$ data format achieves a significant level of adoption, the presence of multiple format implementations expands the original specification in often-unforeseen ways. This results in an implicitly defined, de facto format, which can create vulnerabilities in programs handling the associated data files. In this paper we present our initial work on ICARUS: a toolchain for dealing with the problem of understanding and hardening de facto file formats. We show the results of our work in progress in the following areas: labeling and categorizing a corpora of data format samples to understand accepted variations of a format; the detection of sublanguages within the de facto format using both entropy- and taint-tracking-based methods, as a means of breaking down the larger problem of learning how the grammar has evolved; grammar inference via reinforcement learning, as a means of tying together the learned sublanguages; and the defining of both safe subsets of the de facto grammar, as well as translations from unsafe regions of the de facto grammar into safe regions. Real-world data formats evolve as they find use in real-world applications, and a comprehensive ICARUS toolchain for understanding and hardening the resulting de facto formats can identify and address security risks arising from this evolution.","PeriodicalId":413600,"journal":{"name":"2020 IEEE Security and Privacy Workshops (SPW)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Research Report: ICARUS: Understanding De Facto Formats by Way of Feathers and Wax\",\"authors\":\"Sam Cowger, Yerim Lee, Nichole Schimanski, Mark Tullsen, Walter Woods, Richard Jones, E. W. Davis, William Harris, Trent Brunson, Carson Harmon, Bradford Larsen, E. Sultanik\",\"doi\":\"10.1109/SPW50608.2020.00067\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When $a$ data format achieves a significant level of adoption, the presence of multiple format implementations expands the original specification in often-unforeseen ways. This results in an implicitly defined, de facto format, which can create vulnerabilities in programs handling the associated data files. In this paper we present our initial work on ICARUS: a toolchain for dealing with the problem of understanding and hardening de facto file formats. We show the results of our work in progress in the following areas: labeling and categorizing a corpora of data format samples to understand accepted variations of a format; the detection of sublanguages within the de facto format using both entropy- and taint-tracking-based methods, as a means of breaking down the larger problem of learning how the grammar has evolved; grammar inference via reinforcement learning, as a means of tying together the learned sublanguages; and the defining of both safe subsets of the de facto grammar, as well as translations from unsafe regions of the de facto grammar into safe regions. Real-world data formats evolve as they find use in real-world applications, and a comprehensive ICARUS toolchain for understanding and hardening the resulting de facto formats can identify and address security risks arising from this evolution.\",\"PeriodicalId\":413600,\"journal\":{\"name\":\"2020 IEEE Security and Privacy Workshops (SPW)\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE Security and Privacy Workshops (SPW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPW50608.2020.00067\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Security and Privacy Workshops (SPW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPW50608.2020.00067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

当$a$数据格式得到广泛采用时，多种格式实现以通常无法预见的方式扩展了原始规范。这导致隐式定义的事实上的格式，这可能在处理相关数据文件的程序中造成漏洞。在本文中，我们介绍了我们关于ICARUS的初步工作:一个用于处理理解和强化事实文件格式问题的工具链。我们在以下领域展示了我们正在进行的工作成果:对数据格式样本的语料库进行标记和分类，以理解格式的可接受变化;使用基于熵和污染跟踪的方法检测实际格式中的子语言，作为分解学习语法如何演变的更大问题的一种手段;通过强化学习进行语法推理，将所学的子语言联系在一起;以及定义事实语法的安全子集，以及从事实语法的不安全区域到安全区域的翻译。现实世界的数据格式随着它们在现实世界应用中的应用而不断发展，而一个全面的ICARUS工具链用于理解和强化由此产生的事实格式，可以识别和解决这种发展所带来的安全风险。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Research Report: ICARUS: Understanding De Facto Formats by Way of Feathers and Wax

When $a$ data format achieves a significant level of adoption, the presence of multiple format implementations expands the original specification in often-unforeseen ways. This results in an implicitly defined, de facto format, which can create vulnerabilities in programs handling the associated data files. In this paper we present our initial work on ICARUS: a toolchain for dealing with the problem of understanding and hardening de facto file formats. We show the results of our work in progress in the following areas: labeling and categorizing a corpora of data format samples to understand accepted variations of a format; the detection of sublanguages within the de facto format using both entropy- and taint-tracking-based methods, as a means of breaking down the larger problem of learning how the grammar has evolved; grammar inference via reinforcement learning, as a means of tying together the learned sublanguages; and the defining of both safe subsets of the de facto grammar, as well as translations from unsafe regions of the de facto grammar into safe regions. Real-world data formats evolve as they find use in real-world applications, and a comprehensive ICARUS toolchain for understanding and hardening the resulting de facto formats can identify and address security risks arising from this evolution.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE Security and Privacy Workshops (SPW)

自引率

0.00%

发文量