{"title":"Wrapper Methods to Correct Mislabelled Training Data","authors":"Jonathan Young, J. Ashburner, S. Ourselin","doi":"10.1109/PRNI.2013.51","DOIUrl":null,"url":null,"abstract":"Machine learning has obvious applications to the diagnosis of disease, and for many neurological conditions features extracted from brain images allow classifiers based on neuroimaging biomarkers to provide a useful complement to more traditional diagnostic methods based on symptoms and psychological testing. However the labels used in the training of such systems frequently depend on standard clinical diagnostic methods, meaning they are not completely reliable in many cases. This uncertainty makes the problems this causes hard to study, as it is difficult to measure both the extent of mislabelling and its effect on results. To avoid this problem, we perform classification of gender based on imaging, as this is definitely known for each subject. We then deliberately make known proportions of the training labels incorrect. This allows us to assess the effect of the level of label noise on classification accuracy, and evaluate methods that allow for the mislabelled data. The methods are wrappers using existing well known classifier algorithms. The results indicate that the methods can be significantly effective at realistic levels of noise in the training labels, but care must be taken in choosing which method to apply depending on the level of label noise.","PeriodicalId":144007,"journal":{"name":"2013 International Workshop on Pattern Recognition in Neuroimaging","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Workshop on Pattern Recognition in Neuroimaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PRNI.2013.51","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
Machine learning has obvious applications to the diagnosis of disease, and for many neurological conditions features extracted from brain images allow classifiers based on neuroimaging biomarkers to provide a useful complement to more traditional diagnostic methods based on symptoms and psychological testing. However the labels used in the training of such systems frequently depend on standard clinical diagnostic methods, meaning they are not completely reliable in many cases. This uncertainty makes the problems this causes hard to study, as it is difficult to measure both the extent of mislabelling and its effect on results. To avoid this problem, we perform classification of gender based on imaging, as this is definitely known for each subject. We then deliberately make known proportions of the training labels incorrect. This allows us to assess the effect of the level of label noise on classification accuracy, and evaluate methods that allow for the mislabelled data. The methods are wrappers using existing well known classifier algorithms. The results indicate that the methods can be significantly effective at realistic levels of noise in the training labels, but care must be taken in choosing which method to apply depending on the level of label noise.