{"title":"Towards Transformer-Based Real-Time Object Detection at the Edge: A Benchmarking Study","authors":"Colin Samplawski, Benjamin M. Marlin","doi":"10.1109/MILCOM52596.2021.9653052","DOIUrl":null,"url":null,"abstract":"Recent work has demonstrated the success of end-to-end transformer-based object detection models. These models achieve predictive performance that is competitive with current state-of-the-art detection model frameworks without many of the hand-crafted components needed by previous models (such as non-maximal suppression and anchor boxes). In this paper, we provide the first benchmarking study of transformer-based detection models focused on real-time and edge deployment. We show that transformer-based detection model architectures can achieve 30FPS detection rates on NVIDIA Jetson edge hardware and exceed 40FPS on desktop hardware. However, we observe that achieving these latency levels within the design space that we specify results in a drop in predictive performance, particularly on smaller objects. We conclude by discussing potential next steps for improving the edge and IoT deployment performance of this interesting new class of models.","PeriodicalId":187645,"journal":{"name":"MILCOM 2021 - 2021 IEEE Military Communications Conference (MILCOM)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MILCOM 2021 - 2021 IEEE Military Communications Conference (MILCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MILCOM52596.2021.9653052","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Recent work has demonstrated the success of end-to-end transformer-based object detection models. These models achieve predictive performance that is competitive with current state-of-the-art detection model frameworks without many of the hand-crafted components needed by previous models (such as non-maximal suppression and anchor boxes). In this paper, we provide the first benchmarking study of transformer-based detection models focused on real-time and edge deployment. We show that transformer-based detection model architectures can achieve 30FPS detection rates on NVIDIA Jetson edge hardware and exceed 40FPS on desktop hardware. However, we observe that achieving these latency levels within the design space that we specify results in a drop in predictive performance, particularly on smaller objects. We conclude by discussing potential next steps for improving the edge and IoT deployment performance of this interesting new class of models.