Jingru Yang , Jin Wang , Yang Zhou , Guodong Lu , Yu Sun , Huan Yu , Heming Fang , Zhihui Li , Shengfeng He
{"title":"Sketch-SparseNet: Sparse convolution framework for sketch recognition","authors":"Jingru Yang , Jin Wang , Yang Zhou , Guodong Lu , Yu Sun , Huan Yu , Heming Fang , Zhihui Li , Shengfeng He","doi":"10.1016/j.patcog.2025.111682","DOIUrl":null,"url":null,"abstract":"<div><div>In free-hand sketch recognition, state-of-the-art methods often struggle to extract spatial features from sketches with sparse distributions, which are characterized by significant blank regions devoid of informative content. To address this challenge, we introduce a novel framework for sketch recognition, termed <em>Sketch-SparseNet</em>. This framework incorporates an advanced convolutional component: the Sketch-Driven Dilated Deformable Block (<span><math><mrow><mi>S</mi><msup><mrow><mi>D</mi></mrow><mrow><mn>3</mn></mrow></msup><mi>B</mi></mrow></math></span>). This component excels at extracting spatial features and accurately recognizing free-hand sketches with sparse distributions. The <span><math><mrow><mi>S</mi><msup><mrow><mi>D</mi></mrow><mrow><mn>3</mn></mrow></msup><mi>B</mi></mrow></math></span> component innovatively bridges gaps in the blank areas of sketches by establishing spatial relationships among disconnected stroke points through adaptive reshaping of convolution kernels. These kernels are deformable, dilatable, and dynamically positioned relative to the sketch strokes, ensuring the preservation of spatial information from sketch points. Consequently, <em>Sketch-SparseNet</em> extracts a more accurate and compact representation of spatial features, enhancing sketch recognition performance. Additionally, we introduce the SmoothAlign loss function, which minimizes the disparity between the output features of parallel <span><math><mrow><mi>S</mi><msup><mrow><mi>D</mi></mrow><mrow><mn>3</mn></mrow></msup><mi>B</mi></mrow></math></span> and CNNs, facilitating effective feature fusion. Extensive evaluations on the QuickDraw-414k and TU-Berlin datasets highlight our method’s state-of-the-art performance, achieving accuracies of 79.52% and 85.78%, respectively. To our knowledge, this work represents the first application of a sparse convolution framework that substantially alleviates the adverse effects of sparse sketch points. The codes are available at <span><span>https://github.com/kingbackyang/Sketch-SparseNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111682"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325003425","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In free-hand sketch recognition, state-of-the-art methods often struggle to extract spatial features from sketches with sparse distributions, which are characterized by significant blank regions devoid of informative content. To address this challenge, we introduce a novel framework for sketch recognition, termed Sketch-SparseNet. This framework incorporates an advanced convolutional component: the Sketch-Driven Dilated Deformable Block (). This component excels at extracting spatial features and accurately recognizing free-hand sketches with sparse distributions. The component innovatively bridges gaps in the blank areas of sketches by establishing spatial relationships among disconnected stroke points through adaptive reshaping of convolution kernels. These kernels are deformable, dilatable, and dynamically positioned relative to the sketch strokes, ensuring the preservation of spatial information from sketch points. Consequently, Sketch-SparseNet extracts a more accurate and compact representation of spatial features, enhancing sketch recognition performance. Additionally, we introduce the SmoothAlign loss function, which minimizes the disparity between the output features of parallel and CNNs, facilitating effective feature fusion. Extensive evaluations on the QuickDraw-414k and TU-Berlin datasets highlight our method’s state-of-the-art performance, achieving accuracies of 79.52% and 85.78%, respectively. To our knowledge, this work represents the first application of a sparse convolution framework that substantially alleviates the adverse effects of sparse sketch points. The codes are available at https://github.com/kingbackyang/Sketch-SparseNet.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.