{"title":"Finding reducts without building the discernibility matrix","authors":"M. Korzeń, S. Jaroszewicz","doi":"10.1109/ISDA.2005.45","DOIUrl":null,"url":null,"abstract":"We present algorithms for fast generation of short reducts which avoid building the discernibility matrix explicitly. We show how information obtained from this matrix can be obtained based only on the distributions of attribute values. Since the size of discernibility matrix is quadratic in the number of data records, not building the matrix explicitly gives a very significant speedup and makes it possible to find reducts even in very large databases. Algorithms are given for both absolute and relative reducts. Experiments show that our approach outperforms other reduct finding algorithms. Furthermore it is shown that many heuristic reduct finding algorithms using the discernibility matrix in fact select attributes based on their Gini index. A new definition of conditional Gini index is presented, motivated by reduct finding heuristics.","PeriodicalId":345842,"journal":{"name":"5th International Conference on Intelligent Systems Design and Applications (ISDA'05)","volume":"22 7","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"5th International Conference on Intelligent Systems Design and Applications (ISDA'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISDA.2005.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18
Abstract
We present algorithms for fast generation of short reducts which avoid building the discernibility matrix explicitly. We show how information obtained from this matrix can be obtained based only on the distributions of attribute values. Since the size of discernibility matrix is quadratic in the number of data records, not building the matrix explicitly gives a very significant speedup and makes it possible to find reducts even in very large databases. Algorithms are given for both absolute and relative reducts. Experiments show that our approach outperforms other reduct finding algorithms. Furthermore it is shown that many heuristic reduct finding algorithms using the discernibility matrix in fact select attributes based on their Gini index. A new definition of conditional Gini index is presented, motivated by reduct finding heuristics.