Li FeiFei, Meng Qi, Hong Bo, Zhang Lixiang, Ji Wen
{"title":"Spprnet:基于空间金字塔池和异构卷积的图书馆资料识别鲁棒CNN","authors":"Li FeiFei, Meng Qi, Hong Bo, Zhang Lixiang, Ji Wen","doi":"10.1111/coin.70094","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>In library environments, the diverse scales of materials and batch instability caused by inconsistent scanning conditions pose challenges for image recognition tasks. Traditional ResNet architectures, due to their fixed input size constraints, may reduce their recognition accuracy for images of arbitrary sizes. In this study, we introduce a novel heterogeneous convolution strategy, adjust batch normalization operations, and incorporate a spatial pyramid pooling module based on the ResNet18 network to eliminate these limitations. This new architecture, termed SPPRNet, supports flexible processing of arbitrary-sized inputs and combines multi-scale convolution kernels (3 <span></span><math>\n <semantics>\n <mrow>\n <mo>×</mo>\n </mrow>\n <annotation>$$ \\times $$</annotation>\n </semantics></math> 3, 5 <span></span><math>\n <semantics>\n <mrow>\n <mo>×</mo>\n </mrow>\n <annotation>$$ \\times $$</annotation>\n </semantics></math> 5, 7 <span></span><math>\n <semantics>\n <mrow>\n <mo>×</mo>\n </mrow>\n <annotation>$$ \\times $$</annotation>\n </semantics></math> 7) to simultaneously capture fine-grained features and global contextual patterns. Quantitative results on general datasets demonstrate that our method achieves a 25.47% Top-1 error rate on ImageNet (compared to 30.55% for ResNet18) and attains 92.95% mAP on the Caltech-101 dataset for object detection tasks, outperforming mainstream models such as VGG-16 and MobileNet. The robust performance of our method in image tasks can be extended to existing approaches to further improve the quality of image recognition in library scenarios.</p>\n </div>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"41 4","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Spprnet: A Robust CNN for Library Material Recognition via Spatial Pyramid Pooling and Heterogeneous Convolution\",\"authors\":\"Li FeiFei, Meng Qi, Hong Bo, Zhang Lixiang, Ji Wen\",\"doi\":\"10.1111/coin.70094\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>In library environments, the diverse scales of materials and batch instability caused by inconsistent scanning conditions pose challenges for image recognition tasks. Traditional ResNet architectures, due to their fixed input size constraints, may reduce their recognition accuracy for images of arbitrary sizes. In this study, we introduce a novel heterogeneous convolution strategy, adjust batch normalization operations, and incorporate a spatial pyramid pooling module based on the ResNet18 network to eliminate these limitations. This new architecture, termed SPPRNet, supports flexible processing of arbitrary-sized inputs and combines multi-scale convolution kernels (3 <span></span><math>\\n <semantics>\\n <mrow>\\n <mo>×</mo>\\n </mrow>\\n <annotation>$$ \\\\times $$</annotation>\\n </semantics></math> 3, 5 <span></span><math>\\n <semantics>\\n <mrow>\\n <mo>×</mo>\\n </mrow>\\n <annotation>$$ \\\\times $$</annotation>\\n </semantics></math> 5, 7 <span></span><math>\\n <semantics>\\n <mrow>\\n <mo>×</mo>\\n </mrow>\\n <annotation>$$ \\\\times $$</annotation>\\n </semantics></math> 7) to simultaneously capture fine-grained features and global contextual patterns. Quantitative results on general datasets demonstrate that our method achieves a 25.47% Top-1 error rate on ImageNet (compared to 30.55% for ResNet18) and attains 92.95% mAP on the Caltech-101 dataset for object detection tasks, outperforming mainstream models such as VGG-16 and MobileNet. The robust performance of our method in image tasks can be extended to existing approaches to further improve the quality of image recognition in library scenarios.</p>\\n </div>\",\"PeriodicalId\":55228,\"journal\":{\"name\":\"Computational Intelligence\",\"volume\":\"41 4\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2025-06-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/coin.70094\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/coin.70094","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
在图书馆环境中,不同尺度的材料和不一致的扫描条件导致的批量不稳定性给图像识别任务带来了挑战。传统的ResNet架构由于其固定的输入大小限制,可能会降低其对任意大小图像的识别精度。在这项研究中,我们引入了一种新的异构卷积策略,调整了批归一化操作,并结合了一个基于ResNet18网络的空间金字塔池模块来消除这些限制。这种被称为SPPRNet的新架构支持灵活处理任意大小的输入,并结合了多尺度卷积核(3 × $$ \times $$ 3,5 × $$ \times $$ 5,7 × $$ \times $$ 7)同时捕获细粒度特征和全局上下文模式。在一般数据集上的定量结果表明,我们的方法达到了25.47% Top-1 error rate on ImageNet (compared to 30.55% for ResNet18) and attains 92.95% mAP on the Caltech-101 dataset for object detection tasks, outperforming mainstream models such as VGG-16 and MobileNet. The robust performance of our method in image tasks can be extended to existing approaches to further improve the quality of image recognition in library scenarios.
Spprnet: A Robust CNN for Library Material Recognition via Spatial Pyramid Pooling and Heterogeneous Convolution
In library environments, the diverse scales of materials and batch instability caused by inconsistent scanning conditions pose challenges for image recognition tasks. Traditional ResNet architectures, due to their fixed input size constraints, may reduce their recognition accuracy for images of arbitrary sizes. In this study, we introduce a novel heterogeneous convolution strategy, adjust batch normalization operations, and incorporate a spatial pyramid pooling module based on the ResNet18 network to eliminate these limitations. This new architecture, termed SPPRNet, supports flexible processing of arbitrary-sized inputs and combines multi-scale convolution kernels (3 3, 5 5, 7 7) to simultaneously capture fine-grained features and global contextual patterns. Quantitative results on general datasets demonstrate that our method achieves a 25.47% Top-1 error rate on ImageNet (compared to 30.55% for ResNet18) and attains 92.95% mAP on the Caltech-101 dataset for object detection tasks, outperforming mainstream models such as VGG-16 and MobileNet. The robust performance of our method in image tasks can be extended to existing approaches to further improve the quality of image recognition in library scenarios.
期刊介绍:
This leading international journal promotes and stimulates research in the field of artificial intelligence (AI). Covering a wide range of issues - from the tools and languages of AI to its philosophical implications - Computational Intelligence provides a vigorous forum for the publication of both experimental and theoretical research, as well as surveys and impact studies. The journal is designed to meet the needs of a wide range of AI workers in academic and industrial research.