Kseniya Zablotskaya, Mumtaz Ahmed, Sergey Zablotskiy, W. Minker
{"title":"非参数回归与修正随机平衡法确定最具信息量的特征","authors":"Kseniya Zablotskaya, Mumtaz Ahmed, Sergey Zablotskiy, W. Minker","doi":"10.1109/IE.2010.19","DOIUrl":null,"url":null,"abstract":"In this paper we present a new method which allows us to detect the most informative features out of all data extracted from a certain data corpus. Widely used Pearson’s coefficient is not reliable if the dependency between extracted features (input variables) and the objective function (output) is not linear. This approach is based on a modified random balance method (RBM) combined with non-parametric kernel regression for modeling the dependency between output and input variables. The standard random balance method stochastically determines the most important features of a process, but it requires the values of the objective function at the certain assigned points. If there is no possibility to calculate these values, it is necessary to approximate them. Since we assume that the dependency between stochastic variables can be non-linear, it is necessary to take an appropriate model. We used non-parametric kernel regression because knowledge about the parametric structure of the dependency is not needed. Moreover, we modified the random balance method to handle the non-linearity of the data","PeriodicalId":180375,"journal":{"name":"2010 Sixth International Conference on Intelligent Environments","volume":"88 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Non-parametric Regression and Random Balance Method Modification for Determination of the Most Informative Features\",\"authors\":\"Kseniya Zablotskaya, Mumtaz Ahmed, Sergey Zablotskiy, W. Minker\",\"doi\":\"10.1109/IE.2010.19\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we present a new method which allows us to detect the most informative features out of all data extracted from a certain data corpus. Widely used Pearson’s coefficient is not reliable if the dependency between extracted features (input variables) and the objective function (output) is not linear. This approach is based on a modified random balance method (RBM) combined with non-parametric kernel regression for modeling the dependency between output and input variables. The standard random balance method stochastically determines the most important features of a process, but it requires the values of the objective function at the certain assigned points. If there is no possibility to calculate these values, it is necessary to approximate them. Since we assume that the dependency between stochastic variables can be non-linear, it is necessary to take an appropriate model. We used non-parametric kernel regression because knowledge about the parametric structure of the dependency is not needed. Moreover, we modified the random balance method to handle the non-linearity of the data\",\"PeriodicalId\":180375,\"journal\":{\"name\":\"2010 Sixth International Conference on Intelligent Environments\",\"volume\":\"88 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 Sixth International Conference on Intelligent Environments\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IE.2010.19\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Sixth International Conference on Intelligent Environments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IE.2010.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Non-parametric Regression and Random Balance Method Modification for Determination of the Most Informative Features
In this paper we present a new method which allows us to detect the most informative features out of all data extracted from a certain data corpus. Widely used Pearson’s coefficient is not reliable if the dependency between extracted features (input variables) and the objective function (output) is not linear. This approach is based on a modified random balance method (RBM) combined with non-parametric kernel regression for modeling the dependency between output and input variables. The standard random balance method stochastically determines the most important features of a process, but it requires the values of the objective function at the certain assigned points. If there is no possibility to calculate these values, it is necessary to approximate them. Since we assume that the dependency between stochastic variables can be non-linear, it is necessary to take an appropriate model. We used non-parametric kernel regression because knowledge about the parametric structure of the dependency is not needed. Moreover, we modified the random balance method to handle the non-linearity of the data