{"title":"为单个图像确定适当数量的建议","authors":"Zihang He, Yong Li","doi":"10.1049/cvi2.12230","DOIUrl":null,"url":null,"abstract":"<p>The region proposal network is indispensable to two-stage object detection methods. It generates a fixed number of proposals that are to be classified and regressed by detection heads to produce detection boxes. However, the fixed number of proposals may be too large when an image contains only a few objects but too small when it contains much more objects. Considering this, the authors explored determining a proper number of proposals according to the number of objects in an image to reduce the computational cost while improving the detection accuracy. Since the number of ground truth objects is unknown at the inference stage, the authors designed a simple but effective module to predict the number of foreground regions, which will be substituted for the number of objects for determining the proposal number. Experimental results of various two-stage detection methods on different datasets, including MS-COCO, PASCAL VOC, and CrowdHuman showed that equipping the designed module increased the detection accuracy while decreasing the FLOPs of the detection head. For example, experimental results on the PASCAL VOC dataset showed that applying the designed module to Libra R-CNN and Grid R-CNN increased over 1.5 AP<sub>50</sub> while decreasing the FLOPs of detection heads from 28.6 G to nearly 9.0 G.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 1","pages":"141-149"},"PeriodicalIF":1.5000,"publicationDate":"2023-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12230","citationCount":"0","resultStr":"{\"title\":\"Determining the proper number of proposals for individual images\",\"authors\":\"Zihang He, Yong Li\",\"doi\":\"10.1049/cvi2.12230\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The region proposal network is indispensable to two-stage object detection methods. It generates a fixed number of proposals that are to be classified and regressed by detection heads to produce detection boxes. However, the fixed number of proposals may be too large when an image contains only a few objects but too small when it contains much more objects. Considering this, the authors explored determining a proper number of proposals according to the number of objects in an image to reduce the computational cost while improving the detection accuracy. Since the number of ground truth objects is unknown at the inference stage, the authors designed a simple but effective module to predict the number of foreground regions, which will be substituted for the number of objects for determining the proposal number. Experimental results of various two-stage detection methods on different datasets, including MS-COCO, PASCAL VOC, and CrowdHuman showed that equipping the designed module increased the detection accuracy while decreasing the FLOPs of the detection head. For example, experimental results on the PASCAL VOC dataset showed that applying the designed module to Libra R-CNN and Grid R-CNN increased over 1.5 AP<sub>50</sub> while decreasing the FLOPs of detection heads from 28.6 G to nearly 9.0 G.</p>\",\"PeriodicalId\":56304,\"journal\":{\"name\":\"IET Computer Vision\",\"volume\":\"18 1\",\"pages\":\"141-149\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2023-08-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12230\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IET Computer Vision\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1049/cvi2.12230\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cvi2.12230","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Determining the proper number of proposals for individual images
The region proposal network is indispensable to two-stage object detection methods. It generates a fixed number of proposals that are to be classified and regressed by detection heads to produce detection boxes. However, the fixed number of proposals may be too large when an image contains only a few objects but too small when it contains much more objects. Considering this, the authors explored determining a proper number of proposals according to the number of objects in an image to reduce the computational cost while improving the detection accuracy. Since the number of ground truth objects is unknown at the inference stage, the authors designed a simple but effective module to predict the number of foreground regions, which will be substituted for the number of objects for determining the proposal number. Experimental results of various two-stage detection methods on different datasets, including MS-COCO, PASCAL VOC, and CrowdHuman showed that equipping the designed module increased the detection accuracy while decreasing the FLOPs of the detection head. For example, experimental results on the PASCAL VOC dataset showed that applying the designed module to Libra R-CNN and Grid R-CNN increased over 1.5 AP50 while decreasing the FLOPs of detection heads from 28.6 G to nearly 9.0 G.
期刊介绍:
IET Computer Vision seeks original research papers in a wide range of areas of computer vision. The vision of the journal is to publish the highest quality research work that is relevant and topical to the field, but not forgetting those works that aim to introduce new horizons and set the agenda for future avenues of research in computer vision.
IET Computer Vision welcomes submissions on the following topics:
Biologically and perceptually motivated approaches to low level vision (feature detection, etc.);
Perceptual grouping and organisation
Representation, analysis and matching of 2D and 3D shape
Shape-from-X
Object recognition
Image understanding
Learning with visual inputs
Motion analysis and object tracking
Multiview scene analysis
Cognitive approaches in low, mid and high level vision
Control in visual systems
Colour, reflectance and light
Statistical and probabilistic models
Face and gesture
Surveillance
Biometrics and security
Robotics
Vehicle guidance
Automatic model aquisition
Medical image analysis and understanding
Aerial scene analysis and remote sensing
Deep learning models in computer vision
Both methodological and applications orientated papers are welcome.
Manuscripts submitted are expected to include a detailed and analytical review of the literature and state-of-the-art exposition of the original proposed research and its methodology, its thorough experimental evaluation, and last but not least, comparative evaluation against relevant and state-of-the-art methods. Submissions not abiding by these minimum requirements may be returned to authors without being sent to review.
Special Issues Current Call for Papers:
Computer Vision for Smart Cameras and Camera Networks - https://digital-library.theiet.org/files/IET_CVI_SC.pdf
Computer Vision for the Creative Industries - https://digital-library.theiet.org/files/IET_CVI_CVCI.pdf