{"title":"Error signal distribution as an indicator of imbalanced data","authors":"D. Furundžić, S. Stankovic, Goran Dimić","doi":"10.1109/NEUREL.2014.7011503","DOIUrl":null,"url":null,"abstract":"This paper defines criteria for assessing the imbalance of datasets for training predictive learning models. The most important criterion for evaluating the imbalance is the distribution of the error signal over the space of local measure of distances between the points of the training set. In this paper is presented the analysis of this indicator for the sets of various distributions, and it has been shown that the most information potential for the case of the identical mapping of data sets from the real domain is incorporated within the data whose internal distribution is uniform.","PeriodicalId":402208,"journal":{"name":"12th Symposium on Neural Network Applications in Electrical Engineering (NEUREL)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"12th Symposium on Neural Network Applications in Electrical Engineering (NEUREL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NEUREL.2014.7011503","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper defines criteria for assessing the imbalance of datasets for training predictive learning models. The most important criterion for evaluating the imbalance is the distribution of the error signal over the space of local measure of distances between the points of the training set. In this paper is presented the analysis of this indicator for the sets of various distributions, and it has been shown that the most information potential for the case of the identical mapping of data sets from the real domain is incorporated within the data whose internal distribution is uniform.