ITC-MNP：用于图像文件片段分类的多样化数据集。

IF 1.6 Q2 MULTIDISCIPLINARY SCIENCES

BMC Research Notes Pub Date : 2024-12-19 DOI:10.1186/s13104-024-07034-w

Behnam Tavassoli, Zhino Naghshbandi, Mehdi Teimouri

{"title":"ITC-MNP：用于图像文件片段分类的多样化数据集。","authors":"Behnam Tavassoli, Zhino Naghshbandi, Mehdi Teimouri","doi":"10.1186/s13104-024-07034-w","DOIUrl":null,"url":null,"abstract":"Objectives: Image file fragment classification is a critical area of study in digital forensics. However, many publicly available datasets in this field are derived from a single source, often lacking consideration of the diversity in image settings and content. To demonstrate the effectiveness of a given methodology, it is essential to evaluate it using datasets that are sampled from varied data sources. Therefore, providing a sufficiently diverse dataset is crucial to enable a realistic assessment of any proposed method.Data description: The dataset includes image file fragments of 4096 bytes from five formats (JPG, BMP, GIF, PNG, and TIFF), each processed with different conversion settings. The source images are categorized into three content types: Nature, People, and Medical. In total, the dataset contains 501,000 fragments. These fragments consist of file headers and incomplete end-of-file fragments, completed with random bytes to approximate how operating systems handle data when file sizes are not multiples of the sector size. This approach aims to simulate typical scenarios where fragments are recovered from a hard drive, though it may not capture all real-world complexities such as data corruption and complex file structures.","PeriodicalId":9234,"journal":{"name":"BMC Research Notes","volume":"17 1","pages":"363"},"PeriodicalIF":1.6000,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11658453/pdf/","citationCount":"0","resultStr":"{\"title\":\"ITC-MNP: a diverse dataset for image file fragment classification.\",\"authors\":\"Behnam Tavassoli, Zhino Naghshbandi, Mehdi Teimouri\",\"doi\":\"10.1186/s13104-024-07034-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objectives: Image file fragment classification is a critical area of study in digital forensics. However, many publicly available datasets in this field are derived from a single source, often lacking consideration of the diversity in image settings and content. To demonstrate the effectiveness of a given methodology, it is essential to evaluate it using datasets that are sampled from varied data sources. Therefore, providing a sufficiently diverse dataset is crucial to enable a realistic assessment of any proposed method.Data description: The dataset includes image file fragments of 4096 bytes from five formats (JPG, BMP, GIF, PNG, and TIFF), each processed with different conversion settings. The source images are categorized into three content types: Nature, People, and Medical. In total, the dataset contains 501,000 fragments. These fragments consist of file headers and incomplete end-of-file fragments, completed with random bytes to approximate how operating systems handle data when file sizes are not multiples of the sector size. This approach aims to simulate typical scenarios where fragments are recovered from a hard drive, though it may not capture all real-world complexities such as data corruption and complex file structures.\",\"PeriodicalId\":9234,\"journal\":{\"name\":\"BMC Research Notes\",\"volume\":\"17 1\",\"pages\":\"363\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2024-12-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11658453/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Research Notes\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s13104-024-07034-w\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Research Notes","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s13104-024-07034-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

目的：图像文件片段分类是数字取证研究的一个关键领域。然而，该领域的许多公开可用数据集来自单一来源，通常缺乏对图像设置和内容多样性的考虑。为了证明给定方法的有效性，必须使用从不同数据源采样的数据集对其进行评估。因此，提供一个足够多样化的数据集对于能够对任何提出的方法进行现实的评估至关重要。数据描述：该数据集包括五种格式（JPG、BMP、GIF、PNG和TIFF）的4096字节的图像文件片段，每种格式都使用不同的转换设置进行处理。源图像分为三种内容类型：自然、人物和医疗。数据集总共包含501,000个片段。这些片段由文件头和不完整的文件结束片段组成，用随机字节完成，以近似说明当文件大小不是扇区大小的倍数时操作系统如何处理数据。这种方法旨在模拟从硬盘驱动器中恢复片段的典型场景，尽管它可能无法捕获所有现实世界的复杂性，例如数据损坏和复杂的文件结构。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ITC-MNP: a diverse dataset for image file fragment classification.

Objectives: Image file fragment classification is a critical area of study in digital forensics. However, many publicly available datasets in this field are derived from a single source, often lacking consideration of the diversity in image settings and content. To demonstrate the effectiveness of a given methodology, it is essential to evaluate it using datasets that are sampled from varied data sources. Therefore, providing a sufficiently diverse dataset is crucial to enable a realistic assessment of any proposed method.

Data description: The dataset includes image file fragments of 4096 bytes from five formats (JPG, BMP, GIF, PNG, and TIFF), each processed with different conversion settings. The source images are categorized into three content types: Nature, People, and Medical. In total, the dataset contains 501,000 fragments. These fragments consist of file headers and incomplete end-of-file fragments, completed with random bytes to approximate how operating systems handle data when file sizes are not multiples of the sector size. This approach aims to simulate typical scenarios where fragments are recovered from a hard drive, though it may not capture all real-world complexities such as data corruption and complex file structures.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMC Research Notes Biochemistry, Genetics and Molecular Biology-Biochemistry, Genetics and Molecular Biology (all)

CiteScore

3.60

自引率

0.00%

发文量

363

审稿时长

15 weeks

期刊介绍： BMC Research Notes publishes scientifically valid research outputs that cannot be considered as full research or methodology articles. We support the research community across all scientific and clinical disciplines by providing an open access forum for sharing data and useful information; this includes, but is not limited to, updates to previous work, additions to established methods, short publications, null results, research proposals and data management plans.