{"title":"为什么误差测量是训练神经网络模式分类器的次优","authors":"J. Hampshire, B. V. Vijaya Kumar","doi":"10.1109/IJCNN.1992.227338","DOIUrl":null,"url":null,"abstract":"Pattern classifiers that are trained in a supervised fashion are typically trained with an error measure objective function such as mean-squared error (MSE) or cross-entropy (CE). These classifiers can in theory yield Bayesian discrimination, but in practice they often fail to do so. The authors explain why this happens and identify a number of characteristics that the optimal objective function for training classifiers must have. They show that classification figures of merit (CFM/sub mono/) possess these optimal characteristics, whereas error measures such as MSE and CE do not. The arguments are illustrated with a simple example in which a CFM/sub mono/-trained low-order polynomial neural network approximates Bayesian discrimination on a random scalar with the fewest number of training samples and the minimum functional complexity necessary for the task. A comparable MSE-trained net yields significantly worse discrimination on the same task.<<ETX>>","PeriodicalId":286849,"journal":{"name":"[Proceedings 1992] IJCNN International Joint Conference on Neural Networks","volume":"2014 12","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1992-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Why error measures are sub-optimal for training neural network pattern classifiers\",\"authors\":\"J. Hampshire, B. V. Vijaya Kumar\",\"doi\":\"10.1109/IJCNN.1992.227338\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pattern classifiers that are trained in a supervised fashion are typically trained with an error measure objective function such as mean-squared error (MSE) or cross-entropy (CE). These classifiers can in theory yield Bayesian discrimination, but in practice they often fail to do so. The authors explain why this happens and identify a number of characteristics that the optimal objective function for training classifiers must have. They show that classification figures of merit (CFM/sub mono/) possess these optimal characteristics, whereas error measures such as MSE and CE do not. The arguments are illustrated with a simple example in which a CFM/sub mono/-trained low-order polynomial neural network approximates Bayesian discrimination on a random scalar with the fewest number of training samples and the minimum functional complexity necessary for the task. A comparable MSE-trained net yields significantly worse discrimination on the same task.<<ETX>>\",\"PeriodicalId\":286849,\"journal\":{\"name\":\"[Proceedings 1992] IJCNN International Joint Conference on Neural Networks\",\"volume\":\"2014 12\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1992-06-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"[Proceedings 1992] IJCNN International Joint Conference on Neural Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJCNN.1992.227338\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"[Proceedings 1992] IJCNN International Joint Conference on Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.1992.227338","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Why error measures are sub-optimal for training neural network pattern classifiers
Pattern classifiers that are trained in a supervised fashion are typically trained with an error measure objective function such as mean-squared error (MSE) or cross-entropy (CE). These classifiers can in theory yield Bayesian discrimination, but in practice they often fail to do so. The authors explain why this happens and identify a number of characteristics that the optimal objective function for training classifiers must have. They show that classification figures of merit (CFM/sub mono/) possess these optimal characteristics, whereas error measures such as MSE and CE do not. The arguments are illustrated with a simple example in which a CFM/sub mono/-trained low-order polynomial neural network approximates Bayesian discrimination on a random scalar with the fewest number of training samples and the minimum functional complexity necessary for the task. A comparable MSE-trained net yields significantly worse discrimination on the same task.<>