WoodGLNet：整合全局和局部信息的多尺度网络，用于木材图像的实时分类

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing Pub Date : 2024-08-05 DOI:10.1007/s11554-024-01521-w

Zhishuai Zheng, Zhedong Ge, Zhikang Tian, Xiaoxia Yang, Yucheng Zhou

{"title":"WoodGLNet：整合全局和局部信息的多尺度网络，用于木材图像的实时分类","authors":"Zhishuai Zheng, Zhedong Ge, Zhikang Tian, Xiaoxia Yang, Yucheng Zhou","doi":"10.1007/s11554-024-01521-w","DOIUrl":null,"url":null,"abstract":"<p>Current research on image classification has combined convolutional neural networks (CNNs) and transformers to introduce inductive biases to the model, enhancing its ability to handle long-range dependencies. However, these integrated models have limitations. Standard CNNs have a static nature, restricting their convolution from dynamically adjusting to input images, thus limiting feature expression capabilities. In addition, the static nature of CNNs impedes the seamless integration between features dynamically generated by self-attention mechanisms and static features generated by convolution when combined with transformers. Furthermore, during image processing, each model stage contains abundant information that cannot be fully utilized by single-scale convolution, ultimately impacting the network’s classification performance. To tackle these challenges, we propose WoodGLNet, a real-time multi-scale pyramid network that aggregates global and local information in an input-dependent manner and facilitates feature interaction through three scales of convolution. WoodGLNet utilizes efficient multi-scale global spatial decay attention modules and input-dependent multi-scale dynamic convolutions at different stages, enhancing the network’s inductive biases and expanding the effective receptive field. In CIFAR100 and CIFAR10 image classification tasks, WoodGLNet-T achieves Top-1 accuracies of 76.34% and 92.35%, respectively, outperforming EfficientNet-B3 by 1.03 and 0.86 percentage points. WoodGLNet-S and WoodGLNet-B attain Top-1 accuracies of 77.56%, 93.66%, and 80.12%, 94.27%, respectively. The experimental subjects of this study were sourced from the Shandong Province Construction Structural Material Specimen Museum, tasked with wood testing and requiring high real-time performance. To assess WoodGLNet’s real-time detection capabilities, 20 types of precious wood from the museum were identified in real time using the WoodGLNet network. The results indicated that WoodGLNet achieved a classification accuracy of up to 99.60%, with a recognition time of 0.013 s per single image. These findings demonstrate the network’s exceptional real-time classification and generalization abilities.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"24 1","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"WoodGLNet: a multi-scale network integrating global and local information for real-time classification of wood images\",\"authors\":\"Zhishuai Zheng, Zhedong Ge, Zhikang Tian, Xiaoxia Yang, Yucheng Zhou\",\"doi\":\"10.1007/s11554-024-01521-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Current research on image classification has combined convolutional neural networks (CNNs) and transformers to introduce inductive biases to the model, enhancing its ability to handle long-range dependencies. However, these integrated models have limitations. Standard CNNs have a static nature, restricting their convolution from dynamically adjusting to input images, thus limiting feature expression capabilities. In addition, the static nature of CNNs impedes the seamless integration between features dynamically generated by self-attention mechanisms and static features generated by convolution when combined with transformers. Furthermore, during image processing, each model stage contains abundant information that cannot be fully utilized by single-scale convolution, ultimately impacting the network’s classification performance. To tackle these challenges, we propose WoodGLNet, a real-time multi-scale pyramid network that aggregates global and local information in an input-dependent manner and facilitates feature interaction through three scales of convolution. WoodGLNet utilizes efficient multi-scale global spatial decay attention modules and input-dependent multi-scale dynamic convolutions at different stages, enhancing the network’s inductive biases and expanding the effective receptive field. In CIFAR100 and CIFAR10 image classification tasks, WoodGLNet-T achieves Top-1 accuracies of 76.34% and 92.35%, respectively, outperforming EfficientNet-B3 by 1.03 and 0.86 percentage points. WoodGLNet-S and WoodGLNet-B attain Top-1 accuracies of 77.56%, 93.66%, and 80.12%, 94.27%, respectively. The experimental subjects of this study were sourced from the Shandong Province Construction Structural Material Specimen Museum, tasked with wood testing and requiring high real-time performance. To assess WoodGLNet’s real-time detection capabilities, 20 types of precious wood from the museum were identified in real time using the WoodGLNet network. The results indicated that WoodGLNet achieved a classification accuracy of up to 99.60%, with a recognition time of 0.013 s per single image. These findings demonstrate the network’s exceptional real-time classification and generalization abilities.</p>\",\"PeriodicalId\":51224,\"journal\":{\"name\":\"Journal of Real-Time Image Processing\",\"volume\":\"24 1\",\"pages\":\"\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Real-Time Image Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11554-024-01521-w\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Real-Time Image Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11554-024-01521-w","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

目前的图像分类研究已将卷积神经网络（CNN）和变压器结合起来，为模型引入感应偏差，从而增强其处理远距离依赖关系的能力。不过，这些集成模型也有局限性。标准的卷积神经网络具有静态特性，限制了其卷积对输入图像进行动态调整，从而限制了特征表达能力。此外，CNN 的静态特性也阻碍了自注意机制动态生成的特征与卷积生成的静态特征在与变换器结合时的无缝整合。此外，在图像处理过程中，每个模型阶段都包含丰富的信息，单尺度卷积无法充分利用这些信息，最终影响网络的分类性能。为了应对这些挑战，我们提出了一种实时多尺度金字塔网络（WoodGLNet），它以依赖输入的方式聚合全局和局部信息，并通过三个尺度的卷积促进特征交互。WoodGLNet 在不同阶段利用高效的多尺度全局空间衰减注意模块和依赖输入的多尺度动态卷积，增强了网络的感应偏差，扩大了有效感受野。在 CIFAR100 和 CIFAR10 图像分类任务中，WoodGLNet-T 的 Top-1 准确率分别达到 76.34% 和 92.35%，分别比 EfficientNet-B3 高出 1.03 和 0.86 个百分点。WoodGLNet-S和WoodGLNet-B的Top-1准确率分别为77.56%和93.66%，以及80.12%和94.27%。本研究的实验对象来自山东省建筑结构材料标本馆，承担着木材检测任务，对实时性要求较高。为了评估WoodGLNet的实时检测能力，利用WoodGLNet网络对该馆的20种珍贵木材进行了实时识别。结果表明，WoodGLNet 的分类准确率高达 99.60%，每张图像的识别时间为 0.013 秒。这些结果证明了该网络卓越的实时分类和泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

WoodGLNet: a multi-scale network integrating global and local information for real-time classification of wood images

查看原文本刊更多论文

WoodGLNet: a multi-scale network integrating global and local information for real-time classification of wood images

Current research on image classification has combined convolutional neural networks (CNNs) and transformers to introduce inductive biases to the model, enhancing its ability to handle long-range dependencies. However, these integrated models have limitations. Standard CNNs have a static nature, restricting their convolution from dynamically adjusting to input images, thus limiting feature expression capabilities. In addition, the static nature of CNNs impedes the seamless integration between features dynamically generated by self-attention mechanisms and static features generated by convolution when combined with transformers. Furthermore, during image processing, each model stage contains abundant information that cannot be fully utilized by single-scale convolution, ultimately impacting the network’s classification performance. To tackle these challenges, we propose WoodGLNet, a real-time multi-scale pyramid network that aggregates global and local information in an input-dependent manner and facilitates feature interaction through three scales of convolution. WoodGLNet utilizes efficient multi-scale global spatial decay attention modules and input-dependent multi-scale dynamic convolutions at different stages, enhancing the network’s inductive biases and expanding the effective receptive field. In CIFAR100 and CIFAR10 image classification tasks, WoodGLNet-T achieves Top-1 accuracies of 76.34% and 92.35%, respectively, outperforming EfficientNet-B3 by 1.03 and 0.86 percentage points. WoodGLNet-S and WoodGLNet-B attain Top-1 accuracies of 77.56%, 93.66%, and 80.12%, 94.27%, respectively. The experimental subjects of this study were sourced from the Shandong Province Construction Structural Material Specimen Museum, tasked with wood testing and requiring high real-time performance. To assess WoodGLNet’s real-time detection capabilities, 20 types of precious wood from the museum were identified in real time using the WoodGLNet network. The results indicated that WoodGLNet achieved a classification accuracy of up to 99.60%, with a recognition time of 0.013 s per single image. These findings demonstrate the network’s exceptional real-time classification and generalization abilities.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Real-Time Image Processing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

6.80

自引率

6.70%

发文量

审稿时长

6 months

期刊介绍： Due to rapid advancements in integrated circuit technology, the rich theoretical results that have been developed by the image and video processing research community are now being increasingly applied in practical systems to solve real-world image and video processing problems. Such systems involve constraints placed not only on their size, cost, and power consumption, but also on the timeliness of the image data processed. Examples of such systems are mobile phones, digital still/video/cell-phone cameras, portable media players, personal digital assistants, high-definition television, video surveillance systems, industrial visual inspection systems, medical imaging devices, vision-guided autonomous robots, spectral imaging systems, and many other real-time embedded systems. In these real-time systems, strict timing requirements demand that results are available within a certain interval of time as imposed by the application. It is often the case that an image processing algorithm is developed and proven theoretically sound, presumably with a specific application in mind, but its practical applications and the detailed steps, methodology, and trade-off analysis required to achieve its real-time performance are not fully explored, leaving these critical and usually non-trivial issues for those wishing to employ the algorithm in a real-time system. The Journal of Real-Time Image Processing is intended to bridge the gap between the theory and practice of image processing, serving the greater community of researchers, practicing engineers, and industrial professionals who deal with designing, implementing or utilizing image processing systems which must satisfy real-time design constraints.