Spprnet：基于空间金字塔池和异构卷积的图书馆资料识别鲁棒CNN

IF 1.8 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Intelligence Pub Date : 2025-06-26 DOI:10.1111/coin.70094

Li FeiFei, Meng Qi, Hong Bo, Zhang Lixiang, Ji Wen

{"title":"Spprnet：基于空间金字塔池和异构卷积的图书馆资料识别鲁棒CNN","authors":"Li FeiFei, Meng Qi, Hong Bo, Zhang Lixiang, Ji Wen","doi":"10.1111/coin.70094","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>In library environments, the diverse scales of materials and batch instability caused by inconsistent scanning conditions pose challenges for image recognition tasks. Traditional ResNet architectures, due to their fixed input size constraints, may reduce their recognition accuracy for images of arbitrary sizes. In this study, we introduce a novel heterogeneous convolution strategy, adjust batch normalization operations, and incorporate a spatial pyramid pooling module based on the ResNet18 network to eliminate these limitations. This new architecture, termed SPPRNet, supports flexible processing of arbitrary-sized inputs and combines multi-scale convolution kernels (3 <span></span><math>\n <semantics>\n <mrow>\n <mo>×</mo>\n </mrow>\n <annotation>$$ \\times $$</annotation>\n </semantics></math> 3, 5 <span></span><math>\n <semantics>\n <mrow>\n <mo>×</mo>\n </mrow>\n <annotation>$$ \\times $$</annotation>\n </semantics></math> 5, 7 <span></span><math>\n <semantics>\n <mrow>\n <mo>×</mo>\n </mrow>\n <annotation>$$ \\times $$</annotation>\n </semantics></math> 7) to simultaneously capture fine-grained features and global contextual patterns. Quantitative results on general datasets demonstrate that our method achieves a 25.47% Top-1 error rate on ImageNet (compared to 30.55% for ResNet18) and attains 92.95% mAP on the Caltech-101 dataset for object detection tasks, outperforming mainstream models such as VGG-16 and MobileNet. The robust performance of our method in image tasks can be extended to existing approaches to further improve the quality of image recognition in library scenarios.</p>\n </div>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"41 4","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Spprnet: A Robust CNN for Library Material Recognition via Spatial Pyramid Pooling and Heterogeneous Convolution\",\"authors\":\"Li FeiFei, Meng Qi, Hong Bo, Zhang Lixiang, Ji Wen\",\"doi\":\"10.1111/coin.70094\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>In library environments, the diverse scales of materials and batch instability caused by inconsistent scanning conditions pose challenges for image recognition tasks. Traditional ResNet architectures, due to their fixed input size constraints, may reduce their recognition accuracy for images of arbitrary sizes. In this study, we introduce a novel heterogeneous convolution strategy, adjust batch normalization operations, and incorporate a spatial pyramid pooling module based on the ResNet18 network to eliminate these limitations. This new architecture, termed SPPRNet, supports flexible processing of arbitrary-sized inputs and combines multi-scale convolution kernels (3 <span></span><math>\\n <semantics>\\n <mrow>\\n <mo>×</mo>\\n </mrow>\\n <annotation>$$ \\\\times $$</annotation>\\n </semantics></math> 3, 5 <span></span><math>\\n <semantics>\\n <mrow>\\n <mo>×</mo>\\n </mrow>\\n <annotation>$$ \\\\times $$</annotation>\\n </semantics></math> 5, 7 <span></span><math>\\n <semantics>\\n <mrow>\\n <mo>×</mo>\\n </mrow>\\n <annotation>$$ \\\\times $$</annotation>\\n </semantics></math> 7) to simultaneously capture fine-grained features and global contextual patterns. Quantitative results on general datasets demonstrate that our method achieves a 25.47% Top-1 error rate on ImageNet (compared to 30.55% for ResNet18) and attains 92.95% mAP on the Caltech-101 dataset for object detection tasks, outperforming mainstream models such as VGG-16 and MobileNet. The robust performance of our method in image tasks can be extended to existing approaches to further improve the quality of image recognition in library scenarios.</p>\\n </div>\",\"PeriodicalId\":55228,\"journal\":{\"name\":\"Computational Intelligence\",\"volume\":\"41 4\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2025-06-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/coin.70094\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/coin.70094","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

在图书馆环境中，不同尺度的材料和不一致的扫描条件导致的批量不稳定性给图像识别任务带来了挑战。传统的ResNet架构由于其固定的输入大小限制，可能会降低其对任意大小图像的识别精度。在这项研究中，我们引入了一种新的异构卷积策略，调整了批归一化操作，并结合了一个基于ResNet18网络的空间金字塔池模块来消除这些限制。这种被称为SPPRNet的新架构支持灵活处理任意大小的输入，并结合了多尺度卷积核(3 × $$ \times $$ 3,5 × $$ \times $$ 5，7 × $$ \times $$ 7)同时捕获细粒度特征和全局上下文模式。在一般数据集上的定量结果表明，我们的方法达到了25.47% Top-1 error rate on ImageNet (compared to 30.55% for ResNet18) and attains 92.95% mAP on the Caltech-101 dataset for object detection tasks, outperforming mainstream models such as VGG-16 and MobileNet. The robust performance of our method in image tasks can be extended to existing approaches to further improve the quality of image recognition in library scenarios.

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Spprnet: A Robust CNN for Library Material Recognition via Spatial Pyramid Pooling and Heterogeneous Convolution

In library environments, the diverse scales of materials and batch instability caused by inconsistent scanning conditions pose challenges for image recognition tasks. Traditional ResNet architectures, due to their fixed input size constraints, may reduce their recognition accuracy for images of arbitrary sizes. In this study, we introduce a novel heterogeneous convolution strategy, adjust batch normalization operations, and incorporate a spatial pyramid pooling module based on the ResNet18 network to eliminate these limitations. This new architecture, termed SPPRNet, supports flexible processing of arbitrary-sized inputs and combines multi-scale convolution kernels (3 $\times$ 3, 5 $\times$ 5, 7 $\times$ 7) to simultaneously capture fine-grained features and global contextual patterns. Quantitative results on general datasets demonstrate that our method achieves a 25.47% Top-1 error rate on ImageNet (compared to 30.55% for ResNet18) and attains 92.95% mAP on the Caltech-101 dataset for object detection tasks, outperforming mainstream models such as VGG-16 and MobileNet. The robust performance of our method in image tasks can be extended to existing approaches to further improve the quality of image recognition in library scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computational Intelligence 工程技术-计算机：人工智能

CiteScore

6.90

自引率

3.60%

发文量

审稿时长

>12 weeks

期刊介绍： This leading international journal promotes and stimulates research in the field of artificial intelligence (AI). Covering a wide range of issues - from the tools and languages of AI to its philosophical implications - Computational Intelligence provides a vigorous forum for the publication of both experimental and theoretical research, as well as surveys and impact studies. The journal is designed to meet the needs of a wide range of AI workers in academic and industrial research.