Exploitation and Sanitization of Hidden Data in PDF Files: Do Security Agencies Sanitize Their PDF Files?

Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security Pub Date : 2021-03-03 DOI:10.1145/3437880.3460405

Supriya Adhatarao, C. Lauradoux

{"title":"Exploitation and Sanitization of Hidden Data in PDF Files: Do Security Agencies Sanitize Their PDF Files?","authors":"Supriya Adhatarao, C. Lauradoux","doi":"10.1145/3437880.3460405","DOIUrl":null,"url":null,"abstract":"Organizations publish and share more and more electronic documents like PDF files. Unfortunately, most organizations are unaware that these documents can compromise sensitive information like authors names, details on the information system and architecture. All these information can be exploited easily by attackers to footprint and later attack an organization. In this paper, we analyze hidden data found in the PDF files published by an organization. We gathered a corpus of 39664 PDF files published by 75 security agencies from 47 countries. We have been able to measure the quality and quantity of information exposed in these PDF files. It can be effectively used to find weak links in an organization: the employees who are running outdated software. We have also measured the adoption of PDF files sanitization by security agencies. We identified only 7 security agencies which sanitize few of their PDF files before publishing. Unfortunately, we were still able to find sensitive information within 65% of these sanitized PDF files. Some agencies are using weak sanitization techniques: it requires to remove all the hidden sensitive information from the file and not just to remove the data at the surface. Security agencies need to change their sanitization methods.","PeriodicalId":120300,"journal":{"name":"Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3437880.3460405","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Organizations publish and share more and more electronic documents like PDF files. Unfortunately, most organizations are unaware that these documents can compromise sensitive information like authors names, details on the information system and architecture. All these information can be exploited easily by attackers to footprint and later attack an organization. In this paper, we analyze hidden data found in the PDF files published by an organization. We gathered a corpus of 39664 PDF files published by 75 security agencies from 47 countries. We have been able to measure the quality and quantity of information exposed in these PDF files. It can be effectively used to find weak links in an organization: the employees who are running outdated software. We have also measured the adoption of PDF files sanitization by security agencies. We identified only 7 security agencies which sanitize few of their PDF files before publishing. Unfortunately, we were still able to find sensitive information within 65% of these sanitized PDF files. Some agencies are using weak sanitization techniques: it requires to remove all the hidden sensitive information from the file and not just to remove the data at the surface. Security agencies need to change their sanitization methods.

查看原文本刊更多论文

PDF文件中隐藏数据的利用和净化:安全机构是否净化了他们的PDF文件?

组织发布和共享越来越多的电子文档，如PDF文件。不幸的是，大多数组织都没有意识到这些文档可能会泄露敏感信息，如作者姓名、信息系统和体系结构的详细信息。攻击者可以很容易地利用所有这些信息来跟踪并随后攻击组织。在本文中，我们分析了一个组织发布的PDF文件中发现的隐藏数据。我们收集了来自47个国家的75个安全机构发布的39664个PDF文件。我们已经能够测量这些PDF文件中暴露的信息的质量和数量。它可以有效地用于发现组织中的薄弱环节:运行过时软件的员工。我们还测量了安全机构对PDF文件处理的采用情况。我们发现只有7家安全机构在发布之前对其PDF文件进行了过滤。不幸的是，我们仍然能够在这些经过处理的PDF文件中找到65%的敏感信息。一些机构正在使用弱消毒技术:它需要从文件中删除所有隐藏的敏感信息，而不仅仅是删除表面上的数据。安全机构需要改变他们的消毒方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security

自引率

0.00%

发文量