Prashant Anantharaman, Steven Cheung, Nicholas Boorman, M. Locasto
{"title":"A Format-Aware Reducer for Scriptable Rewriting of PDF Files","authors":"Prashant Anantharaman, Steven Cheung, Nicholas Boorman, M. Locasto","doi":"10.1109/spw54247.2022.9833885","DOIUrl":null,"url":null,"abstract":"Sanitizing untrusted input is a significant unsolved problem in defensive cybersecurity and input handling. Even if we assume that a safe, provably correct parser exists to validate the input syntax, processing logic may still require the application of certain transformations of the parser output. For example, parsers traditionally store the parsed objects in a generic tree structure; hence the processing logic needed to modify this structure can be significant. Also, popular parsing tools do not include the functionality to serialize (or unparse) the internal structure back to bytes.This paper argues for the need for a format-aware tool to modify structured files. In particular, we propose adding a reducer to the Parsley PDF checker. The Parsley Reducer— a tool to apply transformations on input dynamically—would allow developers to design and implement rules to transform PDF files. Next, we describe a set of Parsley normalization tools to be used with the Reducer API and showcase their capabilities using several case studies. Finally, we evaluate our normalization approach to demonstrate that (1) the developer effort to design our reducer rules is minimal, (2) tools extract more text from transformed files than the original files, and (3) other popular PDF transformation tools do not apply the corrections we demonstrate.","PeriodicalId":334852,"journal":{"name":"2022 IEEE Security and Privacy Workshops (SPW)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Security and Privacy Workshops (SPW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/spw54247.2022.9833885","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Sanitizing untrusted input is a significant unsolved problem in defensive cybersecurity and input handling. Even if we assume that a safe, provably correct parser exists to validate the input syntax, processing logic may still require the application of certain transformations of the parser output. For example, parsers traditionally store the parsed objects in a generic tree structure; hence the processing logic needed to modify this structure can be significant. Also, popular parsing tools do not include the functionality to serialize (or unparse) the internal structure back to bytes.This paper argues for the need for a format-aware tool to modify structured files. In particular, we propose adding a reducer to the Parsley PDF checker. The Parsley Reducer— a tool to apply transformations on input dynamically—would allow developers to design and implement rules to transform PDF files. Next, we describe a set of Parsley normalization tools to be used with the Reducer API and showcase their capabilities using several case studies. Finally, we evaluate our normalization approach to demonstrate that (1) the developer effort to design our reducer rules is minimal, (2) tools extract more text from transformed files than the original files, and (3) other popular PDF transformation tools do not apply the corrections we demonstrate.