{"title":"LCNet: Location Combination for Object Detection","authors":"Xin Yi, Bo Ma","doi":"10.1145/3529570.3529596","DOIUrl":null,"url":null,"abstract":"Object detection is a widely studied task in the computer vision field. In recent years, some milestone approaches and solid benchmarks have been proposed, which significantly boosts the development of related researches. The previous object detection methods follow a paradigm: the classification head and the regression head share the same feature extracted by the backbone network. In this paper, we revisit this paradigm for two-stage detectors and prove that the regression head can achieve better results by using the local features. In our proposed Location Combination Networks (LCNet), we extract the effective region of the feature in a Laplace way, and we introduce auxiliary confidence gain loss, Intersection over Union (IoU) gain loss, and distribution loss to guide its convergence. In the classification head, we combine these local features into the global feature for better classification. In the regression head, by ranking these effective regions in the spatial dimension, we can select the local features closest to each foreground boundary and use the selected features to predict the offset of each foreground boundary. Finally, we combine the locations of the four boundaries to obtain the final bounding box prediction. Extensive experimental results on the MS COCO benchmark validate the effectiveness of our proposed method.","PeriodicalId":430367,"journal":{"name":"Proceedings of the 6th International Conference on Digital Signal Processing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Conference on Digital Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3529570.3529596","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Object detection is a widely studied task in the computer vision field. In recent years, some milestone approaches and solid benchmarks have been proposed, which significantly boosts the development of related researches. The previous object detection methods follow a paradigm: the classification head and the regression head share the same feature extracted by the backbone network. In this paper, we revisit this paradigm for two-stage detectors and prove that the regression head can achieve better results by using the local features. In our proposed Location Combination Networks (LCNet), we extract the effective region of the feature in a Laplace way, and we introduce auxiliary confidence gain loss, Intersection over Union (IoU) gain loss, and distribution loss to guide its convergence. In the classification head, we combine these local features into the global feature for better classification. In the regression head, by ranking these effective regions in the spatial dimension, we can select the local features closest to each foreground boundary and use the selected features to predict the offset of each foreground boundary. Finally, we combine the locations of the four boundaries to obtain the final bounding box prediction. Extensive experimental results on the MS COCO benchmark validate the effectiveness of our proposed method.