Sketch-SparseNet: Sparse convolution framework for sketch recognition

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Jingru Yang , Jin Wang , Yang Zhou , Guodong Lu , Yu Sun , Huan Yu , Heming Fang , Zhihui Li , Shengfeng He
{"title":"Sketch-SparseNet: Sparse convolution framework for sketch recognition","authors":"Jingru Yang ,&nbsp;Jin Wang ,&nbsp;Yang Zhou ,&nbsp;Guodong Lu ,&nbsp;Yu Sun ,&nbsp;Huan Yu ,&nbsp;Heming Fang ,&nbsp;Zhihui Li ,&nbsp;Shengfeng He","doi":"10.1016/j.patcog.2025.111682","DOIUrl":null,"url":null,"abstract":"<div><div>In free-hand sketch recognition, state-of-the-art methods often struggle to extract spatial features from sketches with sparse distributions, which are characterized by significant blank regions devoid of informative content. To address this challenge, we introduce a novel framework for sketch recognition, termed <em>Sketch-SparseNet</em>. This framework incorporates an advanced convolutional component: the Sketch-Driven Dilated Deformable Block (<span><math><mrow><mi>S</mi><msup><mrow><mi>D</mi></mrow><mrow><mn>3</mn></mrow></msup><mi>B</mi></mrow></math></span>). This component excels at extracting spatial features and accurately recognizing free-hand sketches with sparse distributions. The <span><math><mrow><mi>S</mi><msup><mrow><mi>D</mi></mrow><mrow><mn>3</mn></mrow></msup><mi>B</mi></mrow></math></span> component innovatively bridges gaps in the blank areas of sketches by establishing spatial relationships among disconnected stroke points through adaptive reshaping of convolution kernels. These kernels are deformable, dilatable, and dynamically positioned relative to the sketch strokes, ensuring the preservation of spatial information from sketch points. Consequently, <em>Sketch-SparseNet</em> extracts a more accurate and compact representation of spatial features, enhancing sketch recognition performance. Additionally, we introduce the SmoothAlign loss function, which minimizes the disparity between the output features of parallel <span><math><mrow><mi>S</mi><msup><mrow><mi>D</mi></mrow><mrow><mn>3</mn></mrow></msup><mi>B</mi></mrow></math></span> and CNNs, facilitating effective feature fusion. Extensive evaluations on the QuickDraw-414k and TU-Berlin datasets highlight our method’s state-of-the-art performance, achieving accuracies of 79.52% and 85.78%, respectively. To our knowledge, this work represents the first application of a sparse convolution framework that substantially alleviates the adverse effects of sparse sketch points. The codes are available at <span><span>https://github.com/kingbackyang/Sketch-SparseNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111682"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325003425","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

In free-hand sketch recognition, state-of-the-art methods often struggle to extract spatial features from sketches with sparse distributions, which are characterized by significant blank regions devoid of informative content. To address this challenge, we introduce a novel framework for sketch recognition, termed Sketch-SparseNet. This framework incorporates an advanced convolutional component: the Sketch-Driven Dilated Deformable Block (SD3B). This component excels at extracting spatial features and accurately recognizing free-hand sketches with sparse distributions. The SD3B component innovatively bridges gaps in the blank areas of sketches by establishing spatial relationships among disconnected stroke points through adaptive reshaping of convolution kernels. These kernels are deformable, dilatable, and dynamically positioned relative to the sketch strokes, ensuring the preservation of spatial information from sketch points. Consequently, Sketch-SparseNet extracts a more accurate and compact representation of spatial features, enhancing sketch recognition performance. Additionally, we introduce the SmoothAlign loss function, which minimizes the disparity between the output features of parallel SD3B and CNNs, facilitating effective feature fusion. Extensive evaluations on the QuickDraw-414k and TU-Berlin datasets highlight our method’s state-of-the-art performance, achieving accuracies of 79.52% and 85.78%, respectively. To our knowledge, this work represents the first application of a sparse convolution framework that substantially alleviates the adverse effects of sparse sketch points. The codes are available at https://github.com/kingbackyang/Sketch-SparseNet.
sketch - sparsenet:用于素描识别的稀疏卷积框架
在手绘素描识别中,最先进的方法往往难以从稀疏分布的素描中提取空间特征,这些素描的特征是缺乏信息内容的显著空白区域。为了解决这一挑战,我们引入了一种新的素描识别框架,称为素描- sparsenet。这个框架包含了一个先进的卷积组件:草图驱动的扩展可变形块(SD3B)。该组件擅长提取空间特征,准确识别具有稀疏分布的手绘草图。SD3B组件通过卷积核的自适应重构,在不相连的笔画点之间建立空间关系,创新性地填补了草图空白区域的空白。这些核是可变形的,可扩展的,并且相对于草图笔画动态定位,确保了草图点的空间信息的保存。因此,sketch - sparsenet提取了更精确和紧凑的空间特征表示,提高了草图识别性能。此外,我们还引入了SmoothAlign损失函数,使并行SD3B和cnn的输出特征之间的差异最小化,从而实现有效的特征融合。对QuickDraw-414k和TU-Berlin数据集的广泛评估突出了我们的方法的最先进性能,分别达到了79.52%和85.78%的准确率。据我们所知,这项工作代表了稀疏卷积框架的第一个应用,它大大减轻了稀疏草图点的不利影响。代码可在https://github.com/kingbackyang/Sketch-SparseNet上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Pattern Recognition
Pattern Recognition 工程技术-工程:电子与电气
CiteScore
14.40
自引率
16.20%
发文量
683
审稿时长
5.6 months
期刊介绍: The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信