研究报告:构建用于安全解析器开发的文件观察站的进展

2022 IEEE Security and Privacy Workshops (SPW) Pub Date : 2022-05-01 DOI:10.1109/spw54247.2022.9833875

Tim Allison, Wayne Burke, Dustin Graf, C. Mattmann, Anastasija Mensikova, Michael Milano, Philip Southam, R. Stonebraker

{"title":"研究报告:构建用于安全解析器开发的文件观察站的进展","authors":"Tim Allison, Wayne Burke, Dustin Graf, C. Mattmann, Anastasija Mensikova, Michael Milano, Philip Southam, R. Stonebraker","doi":"10.1109/spw54247.2022.9833875","DOIUrl":null,"url":null,"abstract":"Parsing untrusted data is notoriously challenging. Failure to handle maliciously crafted data correctly can (and does) lead to a wide range of vulnerabilities. The Language-theoretic security (LangSec) philosophy seeks to obviate the need for developers to apply ad hoc solutions by, instead, offering formally correct and verifiable input handling throughout the software development lifecycle. One of the key components in developing secure parsers is a broad coverage corpus that enables developers to understand the problem space for a given format and to use, potentially, as seeds for fuzzing and other automated testing. In this paper, we offer an update on work reported at the LangSec 2021 conference on the development of a file observatory to gather and enable analysis on a diverse collection of files at scale. The initial focus of the observatory is on Portable Document Format (PDF) files and file formats typically embedded in PDFs. In this paper, we report on refactoring the ingest process, applying new analytic methods, and improving the User Interface.","PeriodicalId":334852,"journal":{"name":"2022 IEEE Security and Privacy Workshops (SPW)","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Research Report: Progress on Building a File Observatory for Secure Parser Development\",\"authors\":\"Tim Allison, Wayne Burke, Dustin Graf, C. Mattmann, Anastasija Mensikova, Michael Milano, Philip Southam, R. Stonebraker\",\"doi\":\"10.1109/spw54247.2022.9833875\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Parsing untrusted data is notoriously challenging. Failure to handle maliciously crafted data correctly can (and does) lead to a wide range of vulnerabilities. The Language-theoretic security (LangSec) philosophy seeks to obviate the need for developers to apply ad hoc solutions by, instead, offering formally correct and verifiable input handling throughout the software development lifecycle. One of the key components in developing secure parsers is a broad coverage corpus that enables developers to understand the problem space for a given format and to use, potentially, as seeds for fuzzing and other automated testing. In this paper, we offer an update on work reported at the LangSec 2021 conference on the development of a file observatory to gather and enable analysis on a diverse collection of files at scale. The initial focus of the observatory is on Portable Document Format (PDF) files and file formats typically embedded in PDFs. In this paper, we report on refactoring the ingest process, applying new analytic methods, and improving the User Interface.\",\"PeriodicalId\":334852,\"journal\":{\"name\":\"2022 IEEE Security and Privacy Workshops (SPW)\",\"volume\":\"2014 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Security and Privacy Workshops (SPW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/spw54247.2022.9833875\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Security and Privacy Workshops (SPW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/spw54247.2022.9833875","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

解析不可信的数据是非常具有挑战性的。如果不能正确处理恶意制作的数据，可能(而且确实)会导致各种各样的漏洞。语言理论安全(LangSec)哲学旨在通过在整个软件开发生命周期中提供正式正确且可验证的输入处理，来避免开发人员应用特殊解决方案的需要。开发安全解析器的关键组件之一是广泛覆盖的语料库，它使开发人员能够理解给定格式的问题空间，并可能将其用作模糊测试和其他自动化测试的种子。在本文中，我们提供了在LangSec 2021会议上报告的工作更新，该会议讨论了文件观测站的发展，以收集和分析大规模的各种文件集合。天文台最初的工作重点是可移植文件格式(PDF)文件和通常嵌入在PDF中的文件格式。在本文中，我们报告了重构摄取过程，应用新的分析方法，以及改进用户界面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Research Report: Progress on Building a File Observatory for Secure Parser Development

Parsing untrusted data is notoriously challenging. Failure to handle maliciously crafted data correctly can (and does) lead to a wide range of vulnerabilities. The Language-theoretic security (LangSec) philosophy seeks to obviate the need for developers to apply ad hoc solutions by, instead, offering formally correct and verifiable input handling throughout the software development lifecycle. One of the key components in developing secure parsers is a broad coverage corpus that enables developers to understand the problem space for a given format and to use, potentially, as seeds for fuzzing and other automated testing. In this paper, we offer an update on work reported at the LangSec 2021 conference on the development of a file observatory to gather and enable analysis on a diverse collection of files at scale. The initial focus of the observatory is on Portable Document Format (PDF) files and file formats typically embedded in PDFs. In this paper, we report on refactoring the ingest process, applying new analytic methods, and improving the User Interface.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE Security and Privacy Workshops (SPW)

自引率

0.00%

发文量