基于特征选择的文件类型识别算法

D. Cao, Junyong Luo, Meijuan Yin, Huijie Yang
{"title":"基于特征选择的文件类型识别算法","authors":"D. Cao, Junyong Luo, Meijuan Yin, Huijie Yang","doi":"10.1109/ICICISYS.2010.5658559","DOIUrl":null,"url":null,"abstract":"Identifying the true type of an arbitrary file is very important in information security. Methods based on file extensions or magic numbers can be easily spoofed, while a more reliable way is based on analyzing the file's binary content. We propose an algorithm to generate models for each file type based on analyzing the binary contents of a set of known input files by using n-gram analysis and design a novel feature selection evaluation function for extracting signatures from the models, then using the signatures to recognize the true type of unknown files. Our aim is not to use the structure and key words of any specific file types as this allows the approach to be applied to general file types. Experiments show that the proposed approach is promising especially when the feature selection evaluation function is applied.","PeriodicalId":339711,"journal":{"name":"2010 IEEE International Conference on Intelligent Computing and Intelligent Systems","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Feature selection based file type identification algorithm\",\"authors\":\"D. Cao, Junyong Luo, Meijuan Yin, Huijie Yang\",\"doi\":\"10.1109/ICICISYS.2010.5658559\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Identifying the true type of an arbitrary file is very important in information security. Methods based on file extensions or magic numbers can be easily spoofed, while a more reliable way is based on analyzing the file's binary content. We propose an algorithm to generate models for each file type based on analyzing the binary contents of a set of known input files by using n-gram analysis and design a novel feature selection evaluation function for extracting signatures from the models, then using the signatures to recognize the true type of unknown files. Our aim is not to use the structure and key words of any specific file types as this allows the approach to be applied to general file types. Experiments show that the proposed approach is promising especially when the feature selection evaluation function is applied.\",\"PeriodicalId\":339711,\"journal\":{\"name\":\"2010 IEEE International Conference on Intelligent Computing and Intelligent Systems\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Conference on Intelligent Computing and Intelligent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICISYS.2010.5658559\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Intelligent Computing and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICISYS.2010.5658559","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

摘要

识别任意文件的真实类型在信息安全中非常重要。基于文件扩展名或幻数的方法很容易被欺骗,而更可靠的方法是基于分析文件的二进制内容。本文提出了一种基于n-gram分析的已知输入文件二进制内容的模型生成算法,并设计了一种新的特征选择评估函数,用于从模型中提取签名,然后使用签名识别未知文件的真实类型。我们的目标是不使用任何特定文件类型的结构和关键字,因为这允许该方法应用于一般文件类型。实验结果表明,该方法具有较好的应用前景,特别是在应用特征选择评价函数时。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Feature selection based file type identification algorithm
Identifying the true type of an arbitrary file is very important in information security. Methods based on file extensions or magic numbers can be easily spoofed, while a more reliable way is based on analyzing the file's binary content. We propose an algorithm to generate models for each file type based on analyzing the binary contents of a set of known input files by using n-gram analysis and design a novel feature selection evaluation function for extracting signatures from the models, then using the signatures to recognize the true type of unknown files. Our aim is not to use the structure and key words of any specific file types as this allows the approach to be applied to general file types. Experiments show that the proposed approach is promising especially when the feature selection evaluation function is applied.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信