{"title":"Corpus-wide Analysis of Parser Behaviors via a Format Analysis Workbench","authors":"Pottayil Harisanker Menon, Walt Woods","doi":"10.1109/SPW59333.2023.00024","DOIUrl":null,"url":null,"abstract":"As the number of parsers written for a data format grows, the number of interpretations of that format's specification also grows. Often, these interpretations differ in subtle, hard-to-determine ways that can result in parser differentials – where one input passed to two parsing programs results in two semantically different behaviors. For example, two widely-used HTTP parsers have been shown to process packet headers differently, allowing for the exfiltration of private files. To help find, diagnose, and mitigate the risks of parser differentials, we present the Format Analysis Workbench (FAW), a collection of tools for collecting information on large numbers of parser/input interactions and analyzing those interactions to detect and explain differentials. This tool suite supports any number of file formats through a flexible configuration, allows for processing to be scaled horizontally, and can be run offline. It has been used for results including the analysis of more than 1 million PDF files and unifying parser behaviors across these files to identify a gold standard of validity across multiple parsers. The included statistical tools have been used to identify the root causes of parser rendering differentials, including mislabeled non-embedded fonts. Tools for instrumenting existing parsers are also included, such as PolyTracker, allowing for the analysis of blind spots which might be used to craft differentials for other parsers, or to exfiltrate large quantities of data. Through allowing users to characterize parser behaviors at scale against large corpuses of inputs, the FAW helps to mitigate security risks arising from parser behaviors by making it tractable to resolve examples of differentials back to their behavioral causes.","PeriodicalId":308378,"journal":{"name":"2023 IEEE Security and Privacy Workshops (SPW)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Security and Privacy Workshops (SPW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPW59333.2023.00024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
As the number of parsers written for a data format grows, the number of interpretations of that format's specification also grows. Often, these interpretations differ in subtle, hard-to-determine ways that can result in parser differentials – where one input passed to two parsing programs results in two semantically different behaviors. For example, two widely-used HTTP parsers have been shown to process packet headers differently, allowing for the exfiltration of private files. To help find, diagnose, and mitigate the risks of parser differentials, we present the Format Analysis Workbench (FAW), a collection of tools for collecting information on large numbers of parser/input interactions and analyzing those interactions to detect and explain differentials. This tool suite supports any number of file formats through a flexible configuration, allows for processing to be scaled horizontally, and can be run offline. It has been used for results including the analysis of more than 1 million PDF files and unifying parser behaviors across these files to identify a gold standard of validity across multiple parsers. The included statistical tools have been used to identify the root causes of parser rendering differentials, including mislabeled non-embedded fonts. Tools for instrumenting existing parsers are also included, such as PolyTracker, allowing for the analysis of blind spots which might be used to craft differentials for other parsers, or to exfiltrate large quantities of data. Through allowing users to characterize parser behaviors at scale against large corpuses of inputs, the FAW helps to mitigate security risks arising from parser behaviors by making it tractable to resolve examples of differentials back to their behavioral causes.