{"title":"PRESS/HOLD/RELEASE Ultrasonic Gestures and Low Complexity Recognition Based on TCN","authors":"Emad A. Ibrahim, Min Li, J. P. D. Gyvez","doi":"10.1109/SiPS47522.2019.9020579","DOIUrl":null,"url":null,"abstract":"Targeting ultrasound-based gesture recognition, this paper proposes a new universal PRESS/HOLD/RELEASE approach that leverages the diversity of gestures performed on smart devices such as mobile phones and IoT nodes. The new set of gestures are generated by interleaving PRESS/HOLD/RELEASE patterns; abbreviated as P/H/R, with gestures like sweeps between a number of microphones. P/H/R patterns are constructed by a hand as it approaches a top of a microphone to generate a virtual Press. After that, the hand settles for an undefined period of time to generate a virtual Hold and finally departs to generate a virtual Release. The same hand can sweep to a 2nd microphone and perform another P/H/R. Interleaving the P/H/R patterns expands the number of performed gestures. Assuming an on-board speaker transmitting ultrasonic signals, the detection is performed on Doppler shift readings generated by a hand as it approaches and departs a top of a microphone. The Doppler shift readings are presented in a sequence of down-mixed ultrasonic spectrogram frames. We train a Temporal Convolutional Network (TCN) to classify the P/H/R patterns under different environmental noises. Our experimental results show that such P/H/R patterns at a top of a microphone can be achieved with 96.6% accuracy under different noise conditions. A group of P/H/R based gestures has been tested on commercially off-the-shelf (COTS) Samsung Galaxy S7 Edge. Different P/H/R interleaved gestures (such as sweeps, long taps, etc.) are designed using two microphones and a single speaker while using as low as $\\sim 5\\mathrm{K}$ parameters and as low as $\\sim 0.15$ Million operations (MOPs) in compute power per inference. The P/H/R interleaved set of gestures are intuitive and hence are easy to learn by end users. This paves its way to be deployed by smartphones and smart speakers for mass production.","PeriodicalId":256971,"journal":{"name":"2019 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Workshop on Signal Processing Systems (SiPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SiPS47522.2019.9020579","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Targeting ultrasound-based gesture recognition, this paper proposes a new universal PRESS/HOLD/RELEASE approach that leverages the diversity of gestures performed on smart devices such as mobile phones and IoT nodes. The new set of gestures are generated by interleaving PRESS/HOLD/RELEASE patterns; abbreviated as P/H/R, with gestures like sweeps between a number of microphones. P/H/R patterns are constructed by a hand as it approaches a top of a microphone to generate a virtual Press. After that, the hand settles for an undefined period of time to generate a virtual Hold and finally departs to generate a virtual Release. The same hand can sweep to a 2nd microphone and perform another P/H/R. Interleaving the P/H/R patterns expands the number of performed gestures. Assuming an on-board speaker transmitting ultrasonic signals, the detection is performed on Doppler shift readings generated by a hand as it approaches and departs a top of a microphone. The Doppler shift readings are presented in a sequence of down-mixed ultrasonic spectrogram frames. We train a Temporal Convolutional Network (TCN) to classify the P/H/R patterns under different environmental noises. Our experimental results show that such P/H/R patterns at a top of a microphone can be achieved with 96.6% accuracy under different noise conditions. A group of P/H/R based gestures has been tested on commercially off-the-shelf (COTS) Samsung Galaxy S7 Edge. Different P/H/R interleaved gestures (such as sweeps, long taps, etc.) are designed using two microphones and a single speaker while using as low as $\sim 5\mathrm{K}$ parameters and as low as $\sim 0.15$ Million operations (MOPs) in compute power per inference. The P/H/R interleaved set of gestures are intuitive and hence are easy to learn by end users. This paves its way to be deployed by smartphones and smart speakers for mass production.