{"title":"研究报告:加强PDF信任链中的薄弱环节","authors":"Mark Tullsen, William Harris, P. Wyatt","doi":"10.1109/spw54247.2022.9833889","DOIUrl":null,"url":null,"abstract":"In many practical and security-critical formats, the interpretation of a document segment as a Document Object Model (DOM) graph depends on a concept of reference and complex contextual data that binds references to data objects. Such referential context itself is defined discontinuously, and is often compressed, to satisfy practical constraints on usability and performance. The integrity of these references and their context must be ensured so that an unambiguous DOM graph is established from a basis of trust.This paper describes a case study of a critical instance of such a design, namely the construction of PDF cross-reference data, in the presence of potentially multiple incremental updates and multiple complex dialects expressing these references. Over the course of our case study, we found that the full definition of cross-reference data in PDF contains several subtleties that are interpreted differently by natural implementations, but which can nevertheless be formalized using monadic parsers with constructs for explicitly capturing and updating input streams.Producing our definition raised several issues in the PDF standard acknowledged and addressed by the PDF Association and the ISO. In the future, the definition can serve as a foundation for implementing novel format security analyses of DOM-defining formats.","PeriodicalId":334852,"journal":{"name":"2022 IEEE Security and Privacy Workshops (SPW)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Research Report: Strengthening Weak Links in the PDF Trust Chain\",\"authors\":\"Mark Tullsen, William Harris, P. Wyatt\",\"doi\":\"10.1109/spw54247.2022.9833889\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In many practical and security-critical formats, the interpretation of a document segment as a Document Object Model (DOM) graph depends on a concept of reference and complex contextual data that binds references to data objects. Such referential context itself is defined discontinuously, and is often compressed, to satisfy practical constraints on usability and performance. The integrity of these references and their context must be ensured so that an unambiguous DOM graph is established from a basis of trust.This paper describes a case study of a critical instance of such a design, namely the construction of PDF cross-reference data, in the presence of potentially multiple incremental updates and multiple complex dialects expressing these references. Over the course of our case study, we found that the full definition of cross-reference data in PDF contains several subtleties that are interpreted differently by natural implementations, but which can nevertheless be formalized using monadic parsers with constructs for explicitly capturing and updating input streams.Producing our definition raised several issues in the PDF standard acknowledged and addressed by the PDF Association and the ISO. In the future, the definition can serve as a foundation for implementing novel format security analyses of DOM-defining formats.\",\"PeriodicalId\":334852,\"journal\":{\"name\":\"2022 IEEE Security and Privacy Workshops (SPW)\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Security and Privacy Workshops (SPW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/spw54247.2022.9833889\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Security and Privacy Workshops (SPW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/spw54247.2022.9833889","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Research Report: Strengthening Weak Links in the PDF Trust Chain
In many practical and security-critical formats, the interpretation of a document segment as a Document Object Model (DOM) graph depends on a concept of reference and complex contextual data that binds references to data objects. Such referential context itself is defined discontinuously, and is often compressed, to satisfy practical constraints on usability and performance. The integrity of these references and their context must be ensured so that an unambiguous DOM graph is established from a basis of trust.This paper describes a case study of a critical instance of such a design, namely the construction of PDF cross-reference data, in the presence of potentially multiple incremental updates and multiple complex dialects expressing these references. Over the course of our case study, we found that the full definition of cross-reference data in PDF contains several subtleties that are interpreted differently by natural implementations, but which can nevertheless be formalized using monadic parsers with constructs for explicitly capturing and updating input streams.Producing our definition raised several issues in the PDF standard acknowledged and addressed by the PDF Association and the ISO. In the future, the definition can serve as a foundation for implementing novel format security analyses of DOM-defining formats.