Xueyuan Zhang, Chunzhe Wang, Han Du, Li Quan, Jin Shi, Yirong Ma
{"title":"突出的基于知识的对象检测","authors":"Xueyuan Zhang, Chunzhe Wang, Han Du, Li Quan, Jin Shi, Yirong Ma","doi":"10.1109/ICCR55715.2022.10053899","DOIUrl":null,"url":null,"abstract":"Human use their visual systems to perceive the interest objects in the images and videos with the past experience including shapes, textures, spatial knowledge and other subconscious information. In this paper, we develop an end-to-end object detection framework, combining with salient knowledge of objects. Firstly, we use the convolutional neural networks(CNNs) to extract the multi-scales feature maps representing the normal knowledge of objects in the images and videos. Then, the candidate feature map is selected from the extracted feature maps to encode the salient knowledge of objects using the mathematical strategy, and the new feature map is generated using the candidate feature map and the salient knowledge of objects. Thirdly, we use the feature map combining with salient knowledge and other feature maps at different scales to identify and localize the objects in the images and videos. The results show that our proposed approach can achieve better performance than other attention-based object detectors on PASCAL VOC 2007 and PASCAL VOC 2012, and this indicates the predicted results of our approach have a good consistency with the object's perception of human brains. At the same time, our approach can process 43 frames per second on the device NVIDIA GTX1080, and is more practical from the efficiency of running time.","PeriodicalId":441511,"journal":{"name":"2022 4th International Conference on Control and Robotics (ICCR)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Salient Knowledge-Based Object Detection\",\"authors\":\"Xueyuan Zhang, Chunzhe Wang, Han Du, Li Quan, Jin Shi, Yirong Ma\",\"doi\":\"10.1109/ICCR55715.2022.10053899\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human use their visual systems to perceive the interest objects in the images and videos with the past experience including shapes, textures, spatial knowledge and other subconscious information. In this paper, we develop an end-to-end object detection framework, combining with salient knowledge of objects. Firstly, we use the convolutional neural networks(CNNs) to extract the multi-scales feature maps representing the normal knowledge of objects in the images and videos. Then, the candidate feature map is selected from the extracted feature maps to encode the salient knowledge of objects using the mathematical strategy, and the new feature map is generated using the candidate feature map and the salient knowledge of objects. Thirdly, we use the feature map combining with salient knowledge and other feature maps at different scales to identify and localize the objects in the images and videos. The results show that our proposed approach can achieve better performance than other attention-based object detectors on PASCAL VOC 2007 and PASCAL VOC 2012, and this indicates the predicted results of our approach have a good consistency with the object's perception of human brains. At the same time, our approach can process 43 frames per second on the device NVIDIA GTX1080, and is more practical from the efficiency of running time.\",\"PeriodicalId\":441511,\"journal\":{\"name\":\"2022 4th International Conference on Control and Robotics (ICCR)\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 4th International Conference on Control and Robotics (ICCR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCR55715.2022.10053899\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 4th International Conference on Control and Robotics (ICCR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCR55715.2022.10053899","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Human use their visual systems to perceive the interest objects in the images and videos with the past experience including shapes, textures, spatial knowledge and other subconscious information. In this paper, we develop an end-to-end object detection framework, combining with salient knowledge of objects. Firstly, we use the convolutional neural networks(CNNs) to extract the multi-scales feature maps representing the normal knowledge of objects in the images and videos. Then, the candidate feature map is selected from the extracted feature maps to encode the salient knowledge of objects using the mathematical strategy, and the new feature map is generated using the candidate feature map and the salient knowledge of objects. Thirdly, we use the feature map combining with salient knowledge and other feature maps at different scales to identify and localize the objects in the images and videos. The results show that our proposed approach can achieve better performance than other attention-based object detectors on PASCAL VOC 2007 and PASCAL VOC 2012, and this indicates the predicted results of our approach have a good consistency with the object's perception of human brains. At the same time, our approach can process 43 frames per second on the device NVIDIA GTX1080, and is more practical from the efficiency of running time.