{"title":"A lightweight convolutional neural network-based feature extractor for visible images","authors":"Xujie He, Jing Jin, Yu Jiang, Dandan Li","doi":"10.1016/j.cviu.2024.104157","DOIUrl":null,"url":null,"abstract":"<div><p>Feature extraction networks (FENs), as the first stage in many computer vision tasks, play critical roles. Previous studies regarding FENs employed deeper and wider networks to attain higher accuracy, but their approaches were memory-inefficient and computationally intensive. Here, we present an accurate and lightweight feature extractor (RoShuNet) for visible images based on ShuffleNetV2. The provided improvements are threefold. To make ShuffleNetV2 compact without degrading its feature extraction ability, we propose an aggregated dual group convolutional module; to better aid the channel interflow process, we propose a <span><math><mi>γ</mi></math></span>-weighted shuffling module; to further reduce the complexity and size of the model, we introduce slimming strategies. Classification experiments demonstrate the state-of-the-art (SOTA) performance of RoShuNet, which yields an increase in accuracy and reduces the complexity and size of the model compared to those of ShuffleNetV2. Generalization experiments verify that the proposed method is also applicable to feature extraction tasks in semantic segmentation and multiple-object tracking scenarios, achieving comparable accuracy to that of other approaches with more memory and greater computational efficiency. Our method provides a novel perspective for designing lightweight models.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104157"},"PeriodicalIF":4.3000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224002388","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Feature extraction networks (FENs), as the first stage in many computer vision tasks, play critical roles. Previous studies regarding FENs employed deeper and wider networks to attain higher accuracy, but their approaches were memory-inefficient and computationally intensive. Here, we present an accurate and lightweight feature extractor (RoShuNet) for visible images based on ShuffleNetV2. The provided improvements are threefold. To make ShuffleNetV2 compact without degrading its feature extraction ability, we propose an aggregated dual group convolutional module; to better aid the channel interflow process, we propose a -weighted shuffling module; to further reduce the complexity and size of the model, we introduce slimming strategies. Classification experiments demonstrate the state-of-the-art (SOTA) performance of RoShuNet, which yields an increase in accuracy and reduces the complexity and size of the model compared to those of ShuffleNetV2. Generalization experiments verify that the proposed method is also applicable to feature extraction tasks in semantic segmentation and multiple-object tracking scenarios, achieving comparable accuracy to that of other approaches with more memory and greater computational efficiency. Our method provides a novel perspective for designing lightweight models.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems