Farzana Islam, Sumaya, Md Fahad Monir, Ashraful Islam
{"title":"FabricSpotDefect: An annotated dataset for identifying spot defects in different fabric types.","authors":"Farzana Islam, Sumaya, Md Fahad Monir, Ashraful Islam","doi":"10.1016/j.dib.2024.111165","DOIUrl":null,"url":null,"abstract":"<p><p>The FabricSpotDefect dataset is, to the best of our knowledge, the first dataset specifically designed to accurately challenge computer vision in detecting fabric spots. There are a total of 1014 raw images and manually annotated 3288 different categories of spots. This dataset expands to 2300 augmented images after applying six categories of augmentation techniques like flipping, rotating, shearing, saturation adjustment, brightness adjustment, and noise addition. We manually conducted annotations on original images to provide real-world essence rather than augmented images. Two versions are considered for augmented images, one is YOLOv8 resulting in 7641 annotations and another one is COCO format resulting in 7635 annotations. To reduce overfitting and to improve model robustness augmentation technique is required, which eventually increases data diversity. This dataset consists of various types of fabrics such as cotton, linen, silk, denim, patterned textiles, jacquard fabrics, and so on, and spots like stains, discolorations, oil marks, rust, blood marks, and so on. These kinds of spots are quite difficult to detect manually or in other traditional methods. The images were snapped in home lights, using basic everyday clothes, and in normal conditions, making this FabricSpotDefect dataset established in real-world applications. The dataset is organized in a way that makes it easy to use for training, testing, and validating machine learning (ML) models and can be reused at any time since this dataset is real and authentic. Researchers and Developers are free to use this prebuilt dataset to work with artificial intelligence (AI) tools that enhance quality control in the textile industry, such as checking the quality of fabrics used in clothing or medical textiles such as surgical gloves, masks, gauze and aprons and so on. The data is annotated with bounding boxes and polygons to precisely mark spot defects. This dataset is available in Roboflow with various formats like COCO and YOLOv8, which work with different ML frameworks. We strongly claim that our dataset is unique because it covers a wide range of fabrics and challenging spot defects often found in patterned and colorful prints, where spotting defects is especially difficult due to the complexity of the printed fabrics.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"111165"},"PeriodicalIF":1.0000,"publicationDate":"2024-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648198/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data in Brief","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.dib.2024.111165","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
The FabricSpotDefect dataset is, to the best of our knowledge, the first dataset specifically designed to accurately challenge computer vision in detecting fabric spots. There are a total of 1014 raw images and manually annotated 3288 different categories of spots. This dataset expands to 2300 augmented images after applying six categories of augmentation techniques like flipping, rotating, shearing, saturation adjustment, brightness adjustment, and noise addition. We manually conducted annotations on original images to provide real-world essence rather than augmented images. Two versions are considered for augmented images, one is YOLOv8 resulting in 7641 annotations and another one is COCO format resulting in 7635 annotations. To reduce overfitting and to improve model robustness augmentation technique is required, which eventually increases data diversity. This dataset consists of various types of fabrics such as cotton, linen, silk, denim, patterned textiles, jacquard fabrics, and so on, and spots like stains, discolorations, oil marks, rust, blood marks, and so on. These kinds of spots are quite difficult to detect manually or in other traditional methods. The images were snapped in home lights, using basic everyday clothes, and in normal conditions, making this FabricSpotDefect dataset established in real-world applications. The dataset is organized in a way that makes it easy to use for training, testing, and validating machine learning (ML) models and can be reused at any time since this dataset is real and authentic. Researchers and Developers are free to use this prebuilt dataset to work with artificial intelligence (AI) tools that enhance quality control in the textile industry, such as checking the quality of fabrics used in clothing or medical textiles such as surgical gloves, masks, gauze and aprons and so on. The data is annotated with bounding boxes and polygons to precisely mark spot defects. This dataset is available in Roboflow with various formats like COCO and YOLOv8, which work with different ML frameworks. We strongly claim that our dataset is unique because it covers a wide range of fabrics and challenging spot defects often found in patterned and colorful prints, where spotting defects is especially difficult due to the complexity of the printed fabrics.
期刊介绍:
Data in Brief provides a way for researchers to easily share and reuse each other''s datasets by publishing data articles that: -Thoroughly describe your data, facilitating reproducibility. -Make your data, which is often buried in supplementary material, easier to find. -Increase traffic towards associated research articles and data, leading to more citations. -Open up doors for new collaborations. Because you never know what data will be useful to someone else, Data in Brief welcomes submissions that describe data from all research areas.