{"title":"KP2Dtiny: Quantized Neural Keypoint Detection and Description on the Edge","authors":"Thomas Rüegg, Marco Giordano, Michele Magno","doi":"10.1109/AICAS57966.2023.10168598","DOIUrl":null,"url":null,"abstract":"Detection and description of keypoints in images is a fundamental component of a wide range of tasks such as Simultaneous Localization And Mapping (SLAM), image alignment and structure from motion (SfM). Efficient computation of these features is crucial for real-time applications and has been addressed by multiple handcrafted algorithms and, recently, by deep neural network-based detectors. Learned detectors achieve high detection performance, but pose high computational requirements, making them slow and impractical for low-power resource constraint platforms. This paper presents a quantized neural keypoint detector and descriptor optimized for edge devices exploiting two recent AI platforms such as MAX78000 by Analog Devices and the Coral AI USB accelerator from Google. To accommodate the diverse constraints and requirements of various applications, we propose and evaluate two model architectures (KP2DtinySmall and KP2DtinyFast) and deploy them on the aforementioned platforms using full 8-bit integer quantization. Furthermore, we extensively evaluate these models in terms of power, latency and accuracy, reporting results on three image sizes (88x88, 320x240 and 640x480), evaluating both quantized and non-quantized models. Fully quantized, KP2DtinySmall reduces network size by a factor of 54x while improving homographic estimation accuracy on 88x88 images on the most stringent threshold (Correctness d1) by 32.4% (0.550) and on 320x240 images by 10.7% (0.648) compared to the KeypointNet architecture by Yang You et. al. This result is achieved by designing a new network with low power platforms in mind, particularly addressing the lower resolution by increasing the density of detectable features. Deployed on the MAX78000 MCU, inference of low-resolution images is run at 59 FPS, consuming 1.1 mJ per image. On the Coral usb accelerator, KP2DtinyFast runs inference on low-resolution images at 527 FPS consuming 3.1 mJ, on high resolution it achieves 70 FPS at 19.9 mJ per inference.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICAS57966.2023.10168598","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Detection and description of keypoints in images is a fundamental component of a wide range of tasks such as Simultaneous Localization And Mapping (SLAM), image alignment and structure from motion (SfM). Efficient computation of these features is crucial for real-time applications and has been addressed by multiple handcrafted algorithms and, recently, by deep neural network-based detectors. Learned detectors achieve high detection performance, but pose high computational requirements, making them slow and impractical for low-power resource constraint platforms. This paper presents a quantized neural keypoint detector and descriptor optimized for edge devices exploiting two recent AI platforms such as MAX78000 by Analog Devices and the Coral AI USB accelerator from Google. To accommodate the diverse constraints and requirements of various applications, we propose and evaluate two model architectures (KP2DtinySmall and KP2DtinyFast) and deploy them on the aforementioned platforms using full 8-bit integer quantization. Furthermore, we extensively evaluate these models in terms of power, latency and accuracy, reporting results on three image sizes (88x88, 320x240 and 640x480), evaluating both quantized and non-quantized models. Fully quantized, KP2DtinySmall reduces network size by a factor of 54x while improving homographic estimation accuracy on 88x88 images on the most stringent threshold (Correctness d1) by 32.4% (0.550) and on 320x240 images by 10.7% (0.648) compared to the KeypointNet architecture by Yang You et. al. This result is achieved by designing a new network with low power platforms in mind, particularly addressing the lower resolution by increasing the density of detectable features. Deployed on the MAX78000 MCU, inference of low-resolution images is run at 59 FPS, consuming 1.1 mJ per image. On the Coral usb accelerator, KP2DtinyFast runs inference on low-resolution images at 527 FPS consuming 3.1 mJ, on high resolution it achieves 70 FPS at 19.9 mJ per inference.