{"title":"鲁棒PDF恶意软件检测与图像可视化和处理技术","authors":"Andrew Corum, Donovan Jenkins, Jun Zheng","doi":"10.1109/ICDIS.2019.00024","DOIUrl":null,"url":null,"abstract":"PDF, as one of most popular document file format, has been frequently utilized as a vector by attackers to covey malware due to its flexible file structure and the ability to embed different kinds of content. In this paper, we propose a new learning-based method to detect PDF malware using image processing and processing techniques. The PDF files are first converted to grayscale images using image visualization techniques. Then various image features representing the distinct visual characteristics of PDF malware and benign PDF files are extracted. Finally, learning algorithms are applied to create the classification models to classify a new PDF file as malicious or benign. The performance of the proposed method was evaluated using Contagio PDF malware dataset. The results show that the proposed method is a viable solution for PDF malware detection. It is also shown that the proposed method is more robust to resist reverse mimicry attacks than the state-of-art learning-based method.","PeriodicalId":181673,"journal":{"name":"2019 2nd International Conference on Data Intelligence and Security (ICDIS)","volume":"151 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Robust PDF Malware Detection with Image Visualization and Processing Techniques\",\"authors\":\"Andrew Corum, Donovan Jenkins, Jun Zheng\",\"doi\":\"10.1109/ICDIS.2019.00024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"PDF, as one of most popular document file format, has been frequently utilized as a vector by attackers to covey malware due to its flexible file structure and the ability to embed different kinds of content. In this paper, we propose a new learning-based method to detect PDF malware using image processing and processing techniques. The PDF files are first converted to grayscale images using image visualization techniques. Then various image features representing the distinct visual characteristics of PDF malware and benign PDF files are extracted. Finally, learning algorithms are applied to create the classification models to classify a new PDF file as malicious or benign. The performance of the proposed method was evaluated using Contagio PDF malware dataset. The results show that the proposed method is a viable solution for PDF malware detection. It is also shown that the proposed method is more robust to resist reverse mimicry attacks than the state-of-art learning-based method.\",\"PeriodicalId\":181673,\"journal\":{\"name\":\"2019 2nd International Conference on Data Intelligence and Security (ICDIS)\",\"volume\":\"151 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 2nd International Conference on Data Intelligence and Security (ICDIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDIS.2019.00024\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 2nd International Conference on Data Intelligence and Security (ICDIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDIS.2019.00024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Robust PDF Malware Detection with Image Visualization and Processing Techniques
PDF, as one of most popular document file format, has been frequently utilized as a vector by attackers to covey malware due to its flexible file structure and the ability to embed different kinds of content. In this paper, we propose a new learning-based method to detect PDF malware using image processing and processing techniques. The PDF files are first converted to grayscale images using image visualization techniques. Then various image features representing the distinct visual characteristics of PDF malware and benign PDF files are extracted. Finally, learning algorithms are applied to create the classification models to classify a new PDF file as malicious or benign. The performance of the proposed method was evaluated using Contagio PDF malware dataset. The results show that the proposed method is a viable solution for PDF malware detection. It is also shown that the proposed method is more robust to resist reverse mimicry attacks than the state-of-art learning-based method.