U. Rebbapragada, L. Mandrake, K. Wagstaff, D. Gleeson, R. Castaño, Steve Ankuo Chien, C. Brodley
{"title":"Improving onboard analysis of Hyperion images by filtering mislabeled training data examples","authors":"U. Rebbapragada, L. Mandrake, K. Wagstaff, D. Gleeson, R. Castaño, Steve Ankuo Chien, C. Brodley","doi":"10.1109/AERO.2009.4839580","DOIUrl":null,"url":null,"abstract":"This paper presents PWEM, a technique for detecting class label noise in training data. PWEM detects mislabeled examples by assigning to each training example a probability that its label is correct. PWEM calculates this probability by clustering examples from pairs of classes together and analyzing the distribution of labels within each cluster to derive the probability of each label's correctness. We discuss how one can use the probabilities output by PWEM to filter, mitigate, or correct mislabeled training examples. We then provide an in-depth discussion of how we applied PWEM to a sulfur detector that labels pixels from Hyperion images of the Borup-Fiord pass in Northern Canada. PWEM assigned a large number of the sulfur training examples low probabilities, indicating severe mislabeling within the sulfur class. The filtering of those low confidence examples resulted in a cleaner training set and improved the median false positive rate of the classifier by at least 29%.","PeriodicalId":117250,"journal":{"name":"2009 IEEE Aerospace conference","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Aerospace conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AERO.2009.4839580","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
This paper presents PWEM, a technique for detecting class label noise in training data. PWEM detects mislabeled examples by assigning to each training example a probability that its label is correct. PWEM calculates this probability by clustering examples from pairs of classes together and analyzing the distribution of labels within each cluster to derive the probability of each label's correctness. We discuss how one can use the probabilities output by PWEM to filter, mitigate, or correct mislabeled training examples. We then provide an in-depth discussion of how we applied PWEM to a sulfur detector that labels pixels from Hyperion images of the Borup-Fiord pass in Northern Canada. PWEM assigned a large number of the sulfur training examples low probabilities, indicating severe mislabeling within the sulfur class. The filtering of those low confidence examples resulted in a cleaner training set and improved the median false positive rate of the classifier by at least 29%.