{"title":"Geryon: Edge Assisted Real-time and Robust Object Detection on Drones via mmWave Radar and Camera Fusion","authors":"Kaikai Deng, Dong Zhao, Qiaoyue Han, Shuyue Wang, Zihan Zhang, Anfu Zhou, Huadong Ma","doi":"10.1145/3550298","DOIUrl":"https://doi.org/10.1145/3550298","url":null,"abstract":"Vision-based drone-view object detection suffers from severe performance degradation under adverse conditions (e.g., foggy weather, poor illumination). To remedy this, leveraging complementary mmWave radar has become a trend. However, existing fusion approaches seldom apply to drones due to i) the aggravated sparsity and noise of point clouds from low-cost commodity radars, and ii) explosive sensing data and intensive computations leading to high latency. To address these issues, we design Geryon , an edge assisted object detection system on drones, which utilizes a suit of approaches to fully exploit the complementary advantages of camera and mmWave radar on three levels: (i) a novel multi-frame compositing approach utilizes camera to assist radar to address the aggravated sparsity and noise of radar point clouds; (ii) a saliency area extraction and encoding approach utilizes radar to assist camera to reduce the bandwidth consumption and offloading latency; (iii) a parallel transmission and inference approach with a lightweight box enhancement scheme further reduces the offloading latency while ensuring the edge-side accuracy-latency trade-off by the parallelism and better camera-radar fusion. We implement and evaluate Geryon with four datasets we collect under foggy/rainy/snowy weather and poor illumination conditions, demonstrating its great advantages over other state-of-the-art approaches in terms of both accuracy and latency. CCS Concepts:","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"70 1","pages":"109:1-109:27"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72733418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ClenchClick: Hands-Free Target Selection Method Leveraging Teeth-Clench for Augmented Reality","authors":"Xiyuan Shen, Yukang Yan, Chun Yu, Yuanchun Shi","doi":"10.1145/3550327","DOIUrl":"https://doi.org/10.1145/3550327","url":null,"abstract":"We propose to explore teeth-clenching-based target selection in Augmented Reality (AR), as the subtlety in the interaction can be beneficial to applications occupying the user’s hand or that are sensitive to social norms. To support the investigation, we implemented an EMG-based teeth-clenching detection system (ClenchClick), where we adopted customized thresholds for different users. We first explored and compared the potential interaction design leveraging head movements and teeth clenching in combination. We finalized the interaction to take the form of a Point-and-Click manner with clenches as the confirmation mechanism. We evaluated the taskload and performance of ClenchClick by comparing it with two baseline methods in target selection tasks. Results showed that ClenchClick outperformed hand gestures in workload, physical load, accuracy and speed, and outperformed dwell in work load and temporal load. Lastly, through user studies, we demonstrated the advantage of ClenchClick in real-world tasks, including efficient and accurate hands-free target selection, natural and unobtrusive interaction in public, and robust head gesture input. investigated the interaction design, user experience in target selection tasks, and user performance in real-world tasks in a series of user studies. In our first user study, we explored nine potential designs and compared the three most promising designs (ClenchClick, ClenchCross-ingTarget, ClenchCrossingEdge) with a hand-based (Hand Gesture) and a hands-free (Dwell) baseline in target selection tasks. ClenchClick had the best overall user experience with the lowest workload. It outperformed Hand Gesture in both physical and temporal load, and outperformed Dwell in temporal and mental load. In the second study, we evaluated the performance of ClenchClick with two detection methods (General and Personalized), in comparison with a hand-based (Hand Gesture) and a hands-free (Dwell) baseline. Results showed that ClenchClick outperformed Hand Gesture in accuracy (98.9% v.s. 89.4%), and was comparable with Dwell in accuracy and efficiency. We further investigated users’ behavioral characteristics by analyzing their cursor trajectories in the tasks, which showed that ClenchClick was a smoother target selection method. It was more psychologically friendly and occupied less of the user’s attention. Finally, we conducted user studies in three real-world tasks which supported hands-free, social-friendly, and head gesture interaction. Results revealed that ClenchClick is an efficient and accurate target selection method when both hands are occupied. It is social-friendly and satisfying when performing in public, and can serve as activation to head gestures which significantly alleviates false positive issues.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"27 1","pages":"139:1-139:26"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73774807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MobiVQA: Efficient On-Device Visual Question Answering","authors":"Qingqing Cao","doi":"10.1145/3534619","DOIUrl":"https://doi.org/10.1145/3534619","url":null,"abstract":"Visual Question Answering (VQA) is a relatively new task where a user can ask a natural question about an image and obtain an answer. VQA is useful for many applications and is widely popular for users with visual impairments. Our goal is to design a VQA application that works efficiently on mobile devices without requiring cloud support. Such a system will allow users to ask visual questions privately, without having to send their questions to the cloud, while also reduce cloud communication costs. However, existing VQA applications use deep learning models that significantly improve accuracy, but is computationally heavy. Unfortunately, existing techniques that optimize deep learning for mobile devices cannot be applied for VQA because the VQA task is multi-modal—it requires both processing vision and text data. Existing mobile optimizations that work for vision-only or text-only neural networks cannot be applied here because of the dependencies between the two modes. Instead, we design MobiVQA, a set of optimizations that leverage the multi-modal nature of VQA. We show using extensive evaluation on two VQA testbeds and two mobile platforms, that MobiVQA significantly improves latency and energy with minimal accuracy loss compared to state-of-the-art VQA models. For instance, MobiVQA can answer a visual question in 163 milliseconds on the phone, compared to over 20-second latency incurred by the most accurate state-of-the-art model, while incurring less than 1 point reduction in accuracy.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"107 1","pages":"44:1-44:23"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74642437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"WiAdv: Practical and Robust Adversarial Attack against WiFi-based Gesture Recognition System","authors":"Yuxuan Zhou, Huangxun Chen, Chenyu Huang, Qian Zhang","doi":"10.1145/3534618","DOIUrl":"https://doi.org/10.1145/3534618","url":null,"abstract":"WiFi-based gesture recognition systems have attracted enormous interest owing to the non-intrusive of WiFi signals and the wide adoption of WiFi for communication. Despite boosted performance via integrating advanced deep neural network (DNN) classifiers, there lacks sufficient investigation on their security vulnerabilities, which are rooted in the open nature of the wireless medium and the inherent defects (e.g., adversarial attacks) of classifiers. To fill this gap, we aim to study adversarial attacks to DNN-powered WiFi-based gesture recognition to encourage proper countermeasures. We design WiAdv to construct physically realizable adversarial examples to fool these systems. WiAdv features a signal synthesis scheme to craft adversarial signals with desired motion features based on the fundamental principle of WiFi-based gesture recognition, and a black-box attack scheme to handle the inconsistency between the perturbation space and the input space of the classifier caused by the in-between non-differentiable processing modules. We realize and evaluate our attack strategies against a representative state-of-the-art system, Widar3.0 in realistic settings. The experimental results show that the adversarial wireless signals generated by WiAdv achieve over 70% attack success rate on average, and remain robust and effective across different physical settings. Our attack case study and analysis reveal the vulnerability of WiFi-based gesture recognition systems, and we hope WiAdv could help promote the improvement of the relevant systems.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"176 1","pages":"92:1-92:25"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73444985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao‐Shun Wei, Worcester, Li Ziheng, Alexander D. Galvan, SU Zhuoran, Xiao Zhang, E. Solovey, Hao‐Shun Wei, Ziheng Li, Alexander D. Galvan, Zhuoran Su, Xiao Zhang, K. Pahlavan
{"title":"IndexPen: Two-Finger Text Input with Millimeter-Wave Radar","authors":"Hao‐Shun Wei, Worcester, Li Ziheng, Alexander D. Galvan, SU Zhuoran, Xiao Zhang, E. Solovey, Hao‐Shun Wei, Ziheng Li, Alexander D. Galvan, Zhuoran Su, Xiao Zhang, K. Pahlavan","doi":"10.1145/3534601","DOIUrl":"https://doi.org/10.1145/3534601","url":null,"abstract":"In this paper, we introduce IndexPen , a novel interaction technique for text input through two-finger in-air micro-gestures, enabling touch-free, effortless, tracking-based interaction, designed to mirror real-world writing. Our system is based on millimeter-wave radar sensing, and does not require instrumentation on the user. IndexPen can successfully identify 30 distinct gestures, representing the letters A-Z , as well as Space , Backspace , Enter , and a special Activation gesture to prevent unintentional input. Additionally, we include a noise class to differentiate gesture and non-gesture noise. We present our system design, including the radio frequency (RF) processing pipeline, classification model, and real-time detection algorithms. We further demonstrate our proof-of-concept system with data collected over ten days with five participants yielding 95.89% cross-validation accuracy on 31 classes (including noise ). Moreover, we explore the learnability and adaptability of our system for real-world text input with 16 participants who are first-time users to IndexPen over five sessions. After each session, the pre-trained model from the previous five-user study is calibrated on the data collected so far for a new user through transfer learning. The F-1 score showed an average increase of 9.14% per session with the calibration, reaching an average of 88.3% on the last session across the 16 users. Meanwhile, we show that the users can type sentences with IndexPen at 86.2% accuracy, measured by string similarity. This work builds a foundation and vision for future interaction interfaces that could be enabled with this paradigm.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"39 1","pages":"79:1-79:39"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87089522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Zhang, Yinian Zhou, Rui Xi, Shuai Li, Junchen Guo, Yuan He
{"title":"AmbiEar: mmWave Based Voice Recognition in NLoS Scenarios","authors":"J. Zhang, Yinian Zhou, Rui Xi, Shuai Li, Junchen Guo, Yuan He","doi":"10.1145/3550320","DOIUrl":"https://doi.org/10.1145/3550320","url":null,"abstract":"Millimeter wave (mmWave) based sensing is a significant technique that enables innovative smart applications, e.g., voice recognition. The existing works in this area require direct sensing of the human’s near-throat region and consequently have limited applicability in non-line-of-sight (NLoS) scenarios. This paper proposes AmbiEar, the first mmWave based voice recognition approach applicable in NLoS scenarios. AmbiEar is based on the insight that the human’s voice causes correlated vibrations of the surrounding objects, regardless of the human’s position and posture. Therefore, AmbiEar regards the surrounding objects as ears that can perceive sound and realizes indirect sensing of the human’s voice by sensing the vibration of the surrounding objects. By incorporating the designs like common component extraction, signal superimposition, and encoder-decoder network, AmbiEar tackles the challenges induced by low-SNR and distorted signals. We implement AmbiEar on a commercial mmWave radar and evaluate its performance under different settings. The experimental results show that AmbiEar has a word recognition accuracy of 87.21% in NLoS scenarios and reduces the recognition error by 35.1%, compared to the direct sensing approach.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"6 1","pages":"151:1-151:25"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87327725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Wang, Wei Li, Ke Sun, Fusang Zhang, Tao Gu, Chenren Xu, Daqing Zhang
{"title":"LoEar: Push the Range Limit of Acoustic Sensing for Vital Sign Monitoring","authors":"Lei Wang, Wei Li, Ke Sun, Fusang Zhang, Tao Gu, Chenren Xu, Daqing Zhang","doi":"10.1145/3550293","DOIUrl":"https://doi.org/10.1145/3550293","url":null,"abstract":"Acoustic sensing has been explored in numerous applications leveraging the wide deployment of acoustic-enabled devices. However, most of the existing acoustic sensing systems work in a very short range only due to fast attenuation of ultrasonic signals, hindering their real-world deployment. In this paper, we present a novel acoustic sensing system using only a single microphone and speaker, named LoEar, to detect vital signs (respiration and heartbeat) with a significantly increased sensing range. We first develop a model, namely Carrierforming , to enhance the signal-to-noise ratio (SNR) via coherent superposition across multiple subcarriers on the target path. We then propose a novel technique called Continuous-MUSIC (Continuous-MUltiple SIgnal Classification) to detect a dynamic reflections, containing subtle motion, and further identify the target user based on the frequency distribution to enable Carrierforming . Finally, we adopt an adaptive Infinite Impulse Response (IIR) comb notch filter to recover the heartbeat pattern from the Channel Frequency Response (CFR) measurements which are dominated by respiration and further develop a peak-based scheme to estimate respiration rate and heart rate. We conduct extensive experiments to evaluate our system, and results show that our system outperforms the state-of-the-art using commercial devices, i.e., the range of respiration sensing is increased from 2 m to 7 m, and the range of heartbeat sensing is increased from 1.2 m to 6.5 m.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"65 1","pages":"145:1-145:24"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76671850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Miguel Chávez Tapia, Talia Xu, Zehang Wu, M. Z. Zamalloa
{"title":"SunBox: Screen-to-Camera Communication with Ambient Light","authors":"Miguel Chávez Tapia, Talia Xu, Zehang Wu, M. Z. Zamalloa","doi":"10.1145/3534602","DOIUrl":"https://doi.org/10.1145/3534602","url":null,"abstract":"A recent development in wireless communication is the use of optical shutters and smartphone cameras to create optical links solely from ambient light . At the transmitter, a liquid crystal display (LCD) modulates ambient light by changing its level of transparency. At the receiver, a smartphone camera decodes the optical pattern. This LCD-to-camera link requires low-power levels at the transmitter, and it is easy to deploy because it does not require modifying the existing lighting infrastructure. The system, however, provides a low data rate, of just a few tens of bps. This occurs because the LCDs used in the state-of-the-art are slow single-pixel transmitters. To overcome this limitation, we introduce a novel multi-pixel display. Our display is similar to a simple screen, but instead of using embedded LEDs to radiate information, it uses only the surrounding ambient light. We build a prototype, called SunBox, and evaluate it indoors and outdoors with both, artificial and natural ambient light. Our results show that SunBox can achieve a throughput between 2kbps and 10kbps using a low-end smartphone camera with just 30FPS. To the best of our knowledge, this is the first screen-to-camera system that works solely with ambient light. ;","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"124 1","pages":"46:1-46:26"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77342239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CornerRadar: RF-Based Indoor Localization Around Corners","authors":"Shichao Yue, Hao He, Peng-Xia Cao, Kaiwen Zha, Masayuki Koizumi, D. Katabi","doi":"10.1145/3517226","DOIUrl":"https://doi.org/10.1145/3517226","url":null,"abstract":"Unmanned robots are increasingly used around humans in factories, malls, and hotels. As they navigate our space, it is important to ensure that such robots do not collide with people who suddenly appear as they turn a corner. Today, however, there is no practical solution for localizing people around corners. Optical solutions try to track hidden people through their visible shadows on the floor or a sidewall, but they can easily fail depending on the ambient light and the environment. More recent work has considered the use of radio frequency (RF) signals to track people and vehicles around street corners. However, past RF-based proposals rely on a simplistic ray-tracing model that fails in practical indoor scenarios. This paper introduces CornerRadar, an RF-based method that provides accurate around-corner indoor localization. CornerRadar addresses the limitations of the ray-tracing model used in past work. It does so through a novel encoding of how RF signals bounce off walls and occlusions. The encoding, which we call the hint map , is then fed to a neural network along with the radio signals to localize people around corners. Empirical evaluation with people moving around corners in 56 indoor environments shows that CornerRadar achieves a median error that is 3x to 12x smaller than past RF-based solutions for localizing people around corners.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"123 1","pages":"34:1-34:24"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79481741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}