Rob Miller, S. Kokalj-Filipovic, Garrett M. Vanhoy, Joshua Morman
{"title":"Policy Based Synthesis: Data Generation and Augmentation Methods For RF Machine Learning","authors":"Rob Miller, S. Kokalj-Filipovic, Garrett M. Vanhoy, Joshua Morman","doi":"10.1109/GlobalSIP45357.2019.8969160","DOIUrl":null,"url":null,"abstract":"The current dataset generation methods for RF Machine Learning (RFML) tasks consist of either completely synthetically generated data or completely raw digitized data from an RF front end. The synthetic datasets are often unrealistic in terms of waveforms or protocols, and the raw captures are typically unlabeled (or often mislabeled), and can skew machine learning algorithms to focus on non-salient features. Further, the associated storage and processing requirements are quite large. In this work, a novel dataset generation and augmentation method called policy-based synthesis is presented that aims to address the short-comings of either approach by combining basic protocol knowledge with simulated channel and device impairments to supplement over-the-air captures made in a controlled environment. This method permits the learning of salient features and regularizes radio and device anomalies that are not of interest. Practical considerations for collecting and processing data for this hybridized approach are also detailed and examples are provided on a dataset that includes protocols commonly used in the 2.4 GHz ISM band such as Bluetooth and Wi-Fi.","PeriodicalId":221378,"journal":{"name":"2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GlobalSIP45357.2019.8969160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
The current dataset generation methods for RF Machine Learning (RFML) tasks consist of either completely synthetically generated data or completely raw digitized data from an RF front end. The synthetic datasets are often unrealistic in terms of waveforms or protocols, and the raw captures are typically unlabeled (or often mislabeled), and can skew machine learning algorithms to focus on non-salient features. Further, the associated storage and processing requirements are quite large. In this work, a novel dataset generation and augmentation method called policy-based synthesis is presented that aims to address the short-comings of either approach by combining basic protocol knowledge with simulated channel and device impairments to supplement over-the-air captures made in a controlled environment. This method permits the learning of salient features and regularizes radio and device anomalies that are not of interest. Practical considerations for collecting and processing data for this hybridized approach are also detailed and examples are provided on a dataset that includes protocols commonly used in the 2.4 GHz ISM band such as Bluetooth and Wi-Fi.