Ali Saffari, Sin Yong Tan, Mohamad Katanbaf, Homagni Saha, Joshua R. Smith, S. Sarkar
{"title":"Battery-Free Camera Occupancy Detection System","authors":"Ali Saffari, Sin Yong Tan, Mohamad Katanbaf, Homagni Saha, Joshua R. Smith, S. Sarkar","doi":"10.1145/3469116.3470013","DOIUrl":"https://doi.org/10.1145/3469116.3470013","url":null,"abstract":"Occupancy detection systems are commonly equipped with high-quality cameras and a processor with high computational power to run detection algorithms. This paper presents a human occupancy detection system that uses battery-free cameras and a deep learning model implemented on a low-cost hub to detect human presence. Our low-resolution camera harvests energy from ambient light and transmits data to the hub using backscatter communication. We implement the state-of-the-art YOLOv5 network detection algorithm that offers high detection accuracy and fast inferencing speed on a Raspberry Pi 4 Model B. We achieve an inferencing speed of ~ 100ms per image and an overall detection accuracy of >90% with only 2GB CPU RAM on the Raspberry Pi. In the experimental results, we also demonstrate that the detection is robust to noise, illuminance, occlusion, and angle of depression.","PeriodicalId":162801,"journal":{"name":"Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126605070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qingqing Cao, Alexandru Eugen Irimiea, Mohamed Abdelfattah, A. Balasubramanian, N. Lane
{"title":"Are Mobile DNN Accelerators Accelerating DNNs?","authors":"Qingqing Cao, Alexandru Eugen Irimiea, Mohamed Abdelfattah, A. Balasubramanian, N. Lane","doi":"10.1145/3469116.3470011","DOIUrl":"https://doi.org/10.1145/3469116.3470011","url":null,"abstract":"Deep neural networks (DNNs) are running on many mobile and embedded devices with the goal of energy efficiency and highest possible performance. However, DNN workloads are getting more computationally intensive, and simultaneously their deployment is ever-increasing. This has led to the creation of many purpose-built low-power neural accelerators to replace or augment traditional mobile CPUs and GPUs. In this work, we provide an in-depth study of one set of commercially-available mobile accelerators, the Intel Neural Compute Sticks (NCS). We perform a systematic measurement study of the latency and energy of this accelerator under a variety of DNNs including convolutional neural networks (CNNs) for vision tasks and attention-based Transformer models for NLP tasks. We compare to the mobile processors (CPU, GPU, and DSP) on a smartphone and a mobile board. Our study shows commercial mobile accelerators like NCS are not ready yet to provide the performance as claimed. We also point out directions in optimizing the model architectures to better suit these accelerators.","PeriodicalId":162801,"journal":{"name":"Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133192537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Ubiquitous Learning: A First Measurement of On-Device Training Performance","authors":"Dongqi Cai, Qipeng Wang, Yuanqiang Liu, Yunxin Liu, Shangguang Wang, Mengwei Xu","doi":"10.1145/3469116.3470009","DOIUrl":"https://doi.org/10.1145/3469116.3470009","url":null,"abstract":"We are witnessing the emergence of ubiquitous learning, where each device (smartphones, wearables, IoTs, etc) can learn from their environments either alone or collaboratively. Such a new paradigm is enabled by deep learning techniques, or more specifically, on-device training. Given its popularity in the machine learning community, unfortunately, there are no systematic understandings of a critical question: how much cost does it take to train typical deep models on commodity end devices? Therefore, this work performs comprehensive measurements of on-device training with the state-of-the-art training library, 6 mobile phones, and 5 classical neural networks. Our measurements report metrics of training time, energy consumption, memory footprint, hardware utilization, and thermal dynamics, thus help reveal a complete landscape of the on-device training performance. The observations from the measurements help guide us to several promising future directions to efficiently enable ubiquitous learning.","PeriodicalId":162801,"journal":{"name":"Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123226989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jayoung Lee, Pengcheng Wang, Ran Xu, Venkateswara Dasari, Noah Weston, Yin Li, S. Bagchi, S. Chaterji
{"title":"Benchmarking Video Object Detection Systems on Embedded Devices under Resource Contention","authors":"Jayoung Lee, Pengcheng Wang, Ran Xu, Venkateswara Dasari, Noah Weston, Yin Li, S. Bagchi, S. Chaterji","doi":"10.1145/3469116.3470010","DOIUrl":"https://doi.org/10.1145/3469116.3470010","url":null,"abstract":"Adaptive and efficient computer vision systems have been proposed to make computer vision tasks, e.g., object classification and object detection, optimized for embedded boards or mobile devices. These studies focus on optimizing the model (deep network) or system itself, by designing an efficient network architecture or adapting the network architecture at runtime using approximation knobs, such as image size, type of object tracker, head of the object detector (e.g., lighter-weight heads such as one-shot object detectors like YOLO over two-shot object detectors like FRCNN). In this work, we benchmark different video object detection protocols, including FastAdapt, with respect to accuracy, latency, and energy consumption on three different embedded boards that represent the leading edge mobile GPUs. Our set of protocols consists of Faster R-CNN, YOLOv3, SELSA, MEGA, and REPP. Further, we characterize their performance under different levels of resource contention, specifically GPU contention, as would arise due to co-located applications on these boards, contending with the video object detection task. Our key insights are that object detectors have to be coupled with trackers to keep up with the latency requirements (e.g., 30 fps). With this, FastAdapt achieves up to 76 fps on the most well-resourced NVIDIA Jetson-class board---the NVIDIA AGX Xavier. Second, adaptive protocols like FastAdapt, FRCNN, and YOLO (specifically our adaptive variants, FRCNN+ and YOLO+) work well under resource constraints. Among the latest video object detection heads, SELSA achieves the highest accuracy but at a latency of over 2 sec per frame. Our energy consumption experiments bring out that FastAdapt, adaptive FRCNN, and adaptive YOLO are best-in-class, relative to the non-adaptive protocols SELSA, MEGA, and REPP.","PeriodicalId":162801,"journal":{"name":"Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning","volume":"235 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134072321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erwei Wang, James J. Davis, Daniele Moro, Piotr Zielinski, Jia Jie Lim, Claudionor José Nunes Coelho, Satrajit Chatterjee, Peter Y. K. Cheung, G. Constantinides
{"title":"Enabling Binary Neural Network Training on the Edge","authors":"Erwei Wang, James J. Davis, Daniele Moro, Piotr Zielinski, Jia Jie Lim, Claudionor José Nunes Coelho, Satrajit Chatterjee, Peter Y. K. Cheung, G. Constantinides","doi":"10.1145/3469116.3470015","DOIUrl":"https://doi.org/10.1145/3469116.3470015","url":null,"abstract":",","PeriodicalId":162801,"journal":{"name":"Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116377504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ParallelFusion","authors":"Jingyu Lee, Yunxin Liu, Youngki Lee","doi":"10.1145/3469116.3470014","DOIUrl":"https://doi.org/10.1145/3469116.3470014","url":null,"abstract":"Mobile GPUs are extremely under-utilized for DNN computations across different mobile deep learning frameworks and multiple DNNs with various complexities. We explore the feasibility of batching and it improves the throughput by up to 35%. However, real-time applications in mobile have a limited amount of requests to get a benefit from batching. To tackle the challenge, we present ParallelFusion technique that enables concurrent execution of heterogeneous operators to further utilize the mobile GPU. We implemented ParallelFusion over the MNN framework and evaluated on 6 state-of-the-art DNNs. Our evaluation shows that Parallel Fusion achieves up to 195% to 218% throughput with fused execution of 2 and 3 operators compared to single DNN inference.","PeriodicalId":162801,"journal":{"name":"Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning","volume":"43 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120899421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Inference through Early-Exit Networks: Design, Challenges and Directions","authors":"Stefanos Laskaridis, Alexandros Kouris, N. Lane","doi":"10.1145/3469116.3470012","DOIUrl":"https://doi.org/10.1145/3469116.3470012","url":null,"abstract":"DNNs are becoming less and less over-parametrised due to recent advances in efficient model design, through careful hand-crafted or NAS-based methods. Relying on the fact that not all inputs require the same amount of computation to yield a confident prediction, adaptive inference is gaining attention as a prominent approach for pushing the limits of efficient deployment. Particularly, early-exit networks comprise an emerging direction for tailoring the computation depth of each input sample at runtime, offering complementary performance gains to other efficiency optimisations. In this paper, we decompose the design methodology of early-exit networks to its key components and survey the recent advances in each one of them. We also position early-exiting against other efficient inference solutions and provide our insights on the current challenges and most promising future directions for research in the field.","PeriodicalId":162801,"journal":{"name":"Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125630142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}