{"title":"Quantifying Dataset Quality in Radio Frequency Machine Learning","authors":"William H. Clark, Alan J. Michaels","doi":"10.1109/MILCOM52596.2021.9652987","DOIUrl":null,"url":null,"abstract":"Given the significance of data within machine learning systems, quantifying how the quality of the available data affects the final performance is a vital component in development. Examining the relationship between a dataset's quantity and the trained system's performance by parametrically varying the available amount of data, new insights can be learned and used to answer questions more efficiently. Having a metric of quality will better enable the developer to ask questions about what one dataset is considering within it and how it improves or hurts the performance of the trained network, further allowing a deeper investigation and understanding of the unknowns that must be considered by the system. This work establishes the approach to regress the relationship between data quantity and system performance in a way that enables a quantitative comparison of quality for different datasets against a known good test set. Further, this approach allows for an impartial means of comparing the value of data, generated or otherwise acquired, toward the end system's final performance.","PeriodicalId":187645,"journal":{"name":"MILCOM 2021 - 2021 IEEE Military Communications Conference (MILCOM)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MILCOM 2021 - 2021 IEEE Military Communications Conference (MILCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MILCOM52596.2021.9652987","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Given the significance of data within machine learning systems, quantifying how the quality of the available data affects the final performance is a vital component in development. Examining the relationship between a dataset's quantity and the trained system's performance by parametrically varying the available amount of data, new insights can be learned and used to answer questions more efficiently. Having a metric of quality will better enable the developer to ask questions about what one dataset is considering within it and how it improves or hurts the performance of the trained network, further allowing a deeper investigation and understanding of the unknowns that must be considered by the system. This work establishes the approach to regress the relationship between data quantity and system performance in a way that enables a quantitative comparison of quality for different datasets against a known good test set. Further, this approach allows for an impartial means of comparing the value of data, generated or otherwise acquired, toward the end system's final performance.