Tomasz Kajdanowicz, S. Plamowski, Przemyslaw Kazienko
{"title":"Training set selection using entropy based distance","authors":"Tomasz Kajdanowicz, S. Plamowski, Przemyslaw Kazienko","doi":"10.1109/AEECT.2011.6132530","DOIUrl":null,"url":null,"abstract":"Distance measures, especially between probability density functions, are essential in solving machine learning problems. Among classification and clustering, data reduction and selection are some of them. In the paper a new distance measure for comparing and selecting training datasets is described. The distance between two datasets is based on variance of entropy in groups obtained by clustering joint datasets. The proposed approach is examined in dataset selection during prediction of debt portfolio value. Finally, basic evaluation on prediction performance is conducted.","PeriodicalId":408446,"journal":{"name":"2011 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AEECT.2011.6132530","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Distance measures, especially between probability density functions, are essential in solving machine learning problems. Among classification and clustering, data reduction and selection are some of them. In the paper a new distance measure for comparing and selecting training datasets is described. The distance between two datasets is based on variance of entropy in groups obtained by clustering joint datasets. The proposed approach is examined in dataset selection during prediction of debt portfolio value. Finally, basic evaluation on prediction performance is conducted.