Spprnet:基于空间金字塔池和异构卷积的图书馆资料识别鲁棒CNN

IF 1.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Li FeiFei, Meng Qi, Hong Bo, Zhang Lixiang, Ji Wen
{"title":"Spprnet:基于空间金字塔池和异构卷积的图书馆资料识别鲁棒CNN","authors":"Li FeiFei,&nbsp;Meng Qi,&nbsp;Hong Bo,&nbsp;Zhang Lixiang,&nbsp;Ji Wen","doi":"10.1111/coin.70094","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>In library environments, the diverse scales of materials and batch instability caused by inconsistent scanning conditions pose challenges for image recognition tasks. Traditional ResNet architectures, due to their fixed input size constraints, may reduce their recognition accuracy for images of arbitrary sizes. In this study, we introduce a novel heterogeneous convolution strategy, adjust batch normalization operations, and incorporate a spatial pyramid pooling module based on the ResNet18 network to eliminate these limitations. This new architecture, termed SPPRNet, supports flexible processing of arbitrary-sized inputs and combines multi-scale convolution kernels (3 <span></span><math>\n <semantics>\n <mrow>\n <mo>×</mo>\n </mrow>\n <annotation>$$ \\times $$</annotation>\n </semantics></math> 3, 5 <span></span><math>\n <semantics>\n <mrow>\n <mo>×</mo>\n </mrow>\n <annotation>$$ \\times $$</annotation>\n </semantics></math> 5, 7 <span></span><math>\n <semantics>\n <mrow>\n <mo>×</mo>\n </mrow>\n <annotation>$$ \\times $$</annotation>\n </semantics></math> 7) to simultaneously capture fine-grained features and global contextual patterns. Quantitative results on general datasets demonstrate that our method achieves a 25.47% Top-1 error rate on ImageNet (compared to 30.55% for ResNet18) and attains 92.95% mAP on the Caltech-101 dataset for object detection tasks, outperforming mainstream models such as VGG-16 and MobileNet. The robust performance of our method in image tasks can be extended to existing approaches to further improve the quality of image recognition in library scenarios.</p>\n </div>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"41 4","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Spprnet: A Robust CNN for Library Material Recognition via Spatial Pyramid Pooling and Heterogeneous Convolution\",\"authors\":\"Li FeiFei,&nbsp;Meng Qi,&nbsp;Hong Bo,&nbsp;Zhang Lixiang,&nbsp;Ji Wen\",\"doi\":\"10.1111/coin.70094\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>In library environments, the diverse scales of materials and batch instability caused by inconsistent scanning conditions pose challenges for image recognition tasks. Traditional ResNet architectures, due to their fixed input size constraints, may reduce their recognition accuracy for images of arbitrary sizes. In this study, we introduce a novel heterogeneous convolution strategy, adjust batch normalization operations, and incorporate a spatial pyramid pooling module based on the ResNet18 network to eliminate these limitations. This new architecture, termed SPPRNet, supports flexible processing of arbitrary-sized inputs and combines multi-scale convolution kernels (3 <span></span><math>\\n <semantics>\\n <mrow>\\n <mo>×</mo>\\n </mrow>\\n <annotation>$$ \\\\times $$</annotation>\\n </semantics></math> 3, 5 <span></span><math>\\n <semantics>\\n <mrow>\\n <mo>×</mo>\\n </mrow>\\n <annotation>$$ \\\\times $$</annotation>\\n </semantics></math> 5, 7 <span></span><math>\\n <semantics>\\n <mrow>\\n <mo>×</mo>\\n </mrow>\\n <annotation>$$ \\\\times $$</annotation>\\n </semantics></math> 7) to simultaneously capture fine-grained features and global contextual patterns. Quantitative results on general datasets demonstrate that our method achieves a 25.47% Top-1 error rate on ImageNet (compared to 30.55% for ResNet18) and attains 92.95% mAP on the Caltech-101 dataset for object detection tasks, outperforming mainstream models such as VGG-16 and MobileNet. The robust performance of our method in image tasks can be extended to existing approaches to further improve the quality of image recognition in library scenarios.</p>\\n </div>\",\"PeriodicalId\":55228,\"journal\":{\"name\":\"Computational Intelligence\",\"volume\":\"41 4\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2025-06-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/coin.70094\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/coin.70094","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

在图书馆环境中,不同尺度的材料和不一致的扫描条件导致的批量不稳定性给图像识别任务带来了挑战。传统的ResNet架构由于其固定的输入大小限制,可能会降低其对任意大小图像的识别精度。在这项研究中,我们引入了一种新的异构卷积策略,调整了批归一化操作,并结合了一个基于ResNet18网络的空间金字塔池模块来消除这些限制。这种被称为SPPRNet的新架构支持灵活处理任意大小的输入,并结合了多尺度卷积核(3 × $$ \times $$ 3,5 × $$ \times $$ 5,7 × $$ \times $$ 7)同时捕获细粒度特征和全局上下文模式。在一般数据集上的定量结果表明,我们的方法达到了25.47% Top-1 error rate on ImageNet (compared to 30.55% for ResNet18) and attains 92.95% mAP on the Caltech-101 dataset for object detection tasks, outperforming mainstream models such as VGG-16 and MobileNet. The robust performance of our method in image tasks can be extended to existing approaches to further improve the quality of image recognition in library scenarios.
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Spprnet: A Robust CNN for Library Material Recognition via Spatial Pyramid Pooling and Heterogeneous Convolution

In library environments, the diverse scales of materials and batch instability caused by inconsistent scanning conditions pose challenges for image recognition tasks. Traditional ResNet architectures, due to their fixed input size constraints, may reduce their recognition accuracy for images of arbitrary sizes. In this study, we introduce a novel heterogeneous convolution strategy, adjust batch normalization operations, and incorporate a spatial pyramid pooling module based on the ResNet18 network to eliminate these limitations. This new architecture, termed SPPRNet, supports flexible processing of arbitrary-sized inputs and combines multi-scale convolution kernels (3 × $$ \times $$ 3, 5 × $$ \times $$ 5, 7 × $$ \times $$ 7) to simultaneously capture fine-grained features and global contextual patterns. Quantitative results on general datasets demonstrate that our method achieves a 25.47% Top-1 error rate on ImageNet (compared to 30.55% for ResNet18) and attains 92.95% mAP on the Caltech-101 dataset for object detection tasks, outperforming mainstream models such as VGG-16 and MobileNet. The robust performance of our method in image tasks can be extended to existing approaches to further improve the quality of image recognition in library scenarios.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computational Intelligence
Computational Intelligence 工程技术-计算机:人工智能
CiteScore
6.90
自引率
3.60%
发文量
65
审稿时长
>12 weeks
期刊介绍: This leading international journal promotes and stimulates research in the field of artificial intelligence (AI). Covering a wide range of issues - from the tools and languages of AI to its philosophical implications - Computational Intelligence provides a vigorous forum for the publication of both experimental and theoretical research, as well as surveys and impact studies. The journal is designed to meet the needs of a wide range of AI workers in academic and industrial research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信