Zelu Qi, Da Pan, Tianyi Niu, Zefeng Ying, Ping Shi
{"title":"缩小实际应用场景与卡通人物检测之间的差距:基准数据集和深度学习模型","authors":"Zelu Qi, Da Pan, Tianyi Niu, Zefeng Ying, Ping Shi","doi":"10.1016/j.displa.2024.102793","DOIUrl":null,"url":null,"abstract":"<div><p>The success of deep learning in the field of computer vision makes cartoon character detection (CCD) based on target detection expected to become an effective means of protecting intellectual property rights. However, due to the lack of suitable cartoon character datasets, CCD is still a less explored field, and there are still many problems that need to be solved to meet the needs of practical applications such as merchandise, advertising, and patent review. In this paper, we propose a new challenging CCD benchmark dataset, called CCDaS, which consists of 140,339 images of 524 famous cartoon characters from 227 cartoon works, game works, and merchandise innovations. As far as we know, CCDaS is currently the largest dataset of CCD in practical application scenarios. To further study CCD, we also provide a CCD algorithm that can achieve accurate detection of multi-scale objects and facially similar objects in practical application scenarios, called multi-path YOLO (MP-YOLO). Experimental results show that our MP-YOLO achieves better detection results on the CCDaS dataset. Comparative and ablation studies further validate the effectiveness of our CCD dataset and algorithm.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102793"},"PeriodicalIF":3.7000,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bridge the gap between practical application scenarios and cartoon character detection: A benchmark dataset and deep learning model\",\"authors\":\"Zelu Qi, Da Pan, Tianyi Niu, Zefeng Ying, Ping Shi\",\"doi\":\"10.1016/j.displa.2024.102793\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The success of deep learning in the field of computer vision makes cartoon character detection (CCD) based on target detection expected to become an effective means of protecting intellectual property rights. However, due to the lack of suitable cartoon character datasets, CCD is still a less explored field, and there are still many problems that need to be solved to meet the needs of practical applications such as merchandise, advertising, and patent review. In this paper, we propose a new challenging CCD benchmark dataset, called CCDaS, which consists of 140,339 images of 524 famous cartoon characters from 227 cartoon works, game works, and merchandise innovations. As far as we know, CCDaS is currently the largest dataset of CCD in practical application scenarios. To further study CCD, we also provide a CCD algorithm that can achieve accurate detection of multi-scale objects and facially similar objects in practical application scenarios, called multi-path YOLO (MP-YOLO). Experimental results show that our MP-YOLO achieves better detection results on the CCDaS dataset. Comparative and ablation studies further validate the effectiveness of our CCD dataset and algorithm.</p></div>\",\"PeriodicalId\":50570,\"journal\":{\"name\":\"Displays\",\"volume\":\"84 \",\"pages\":\"Article 102793\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Displays\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0141938224001574\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938224001574","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Bridge the gap between practical application scenarios and cartoon character detection: A benchmark dataset and deep learning model
The success of deep learning in the field of computer vision makes cartoon character detection (CCD) based on target detection expected to become an effective means of protecting intellectual property rights. However, due to the lack of suitable cartoon character datasets, CCD is still a less explored field, and there are still many problems that need to be solved to meet the needs of practical applications such as merchandise, advertising, and patent review. In this paper, we propose a new challenging CCD benchmark dataset, called CCDaS, which consists of 140,339 images of 524 famous cartoon characters from 227 cartoon works, game works, and merchandise innovations. As far as we know, CCDaS is currently the largest dataset of CCD in practical application scenarios. To further study CCD, we also provide a CCD algorithm that can achieve accurate detection of multi-scale objects and facially similar objects in practical application scenarios, called multi-path YOLO (MP-YOLO). Experimental results show that our MP-YOLO achieves better detection results on the CCDaS dataset. Comparative and ablation studies further validate the effectiveness of our CCD dataset and algorithm.
期刊介绍:
Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface.
Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.