{"title":"基于信息论的相关特征选择","authors":"A. V. Bulinski","doi":"10.1137/s0040585x97t991520","DOIUrl":null,"url":null,"abstract":"It is shown that widely used suboptimal algorithms of feature selection based on information theory concepts do not necessarily identify a collection of features (relevant in a sense) affecting the studied random response. This can be considered as a reflection of the epistasis phenomenon known in genetics, when individual features have little effect on increased risk of complex disease, whereas certain combinations of features have significant impact on risk. It is demonstrated that a similar effect is also manifested in inferences employing statistical estimates of mutual information.","PeriodicalId":51193,"journal":{"name":"Theory of Probability and its Applications","volume":"73 1","pages":"0"},"PeriodicalIF":0.5000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On Relevant Features Selection Based on Information Theory\",\"authors\":\"A. V. Bulinski\",\"doi\":\"10.1137/s0040585x97t991520\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is shown that widely used suboptimal algorithms of feature selection based on information theory concepts do not necessarily identify a collection of features (relevant in a sense) affecting the studied random response. This can be considered as a reflection of the epistasis phenomenon known in genetics, when individual features have little effect on increased risk of complex disease, whereas certain combinations of features have significant impact on risk. It is demonstrated that a similar effect is also manifested in inferences employing statistical estimates of mutual information.\",\"PeriodicalId\":51193,\"journal\":{\"name\":\"Theory of Probability and its Applications\",\"volume\":\"73 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2023-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Theory of Probability and its Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1137/s0040585x97t991520\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Theory of Probability and its Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1137/s0040585x97t991520","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
On Relevant Features Selection Based on Information Theory
It is shown that widely used suboptimal algorithms of feature selection based on information theory concepts do not necessarily identify a collection of features (relevant in a sense) affecting the studied random response. This can be considered as a reflection of the epistasis phenomenon known in genetics, when individual features have little effect on increased risk of complex disease, whereas certain combinations of features have significant impact on risk. It is demonstrated that a similar effect is also manifested in inferences employing statistical estimates of mutual information.
期刊介绍:
Theory of Probability and Its Applications (TVP) accepts original articles and communications on the theory of probability, general problems of mathematical statistics, and applications of the theory of probability to natural science and technology. Articles of the latter type will be accepted only if the mathematical methods applied are essentially new.