{"title":"粗糙集在数据挖掘领域中的应用","authors":"A. Butalia, M. Dhore, Geetika Tewani","doi":"10.1109/ICETET.2008.143","DOIUrl":null,"url":null,"abstract":"The issues of Real World are Very large data sets, Mixed types of data (continuous valued, symbolic data), Uncertainty (noisy data), Incompleteness (missing, incomplete data), Data change, Use of background knowledge etc. The main goal of the rough set analysis is induction of approximations of concepts [4]. Rough sets constitute a sound basis for KDD. It offers mathematical tools to discover patterns hidden in data [4] and hence used in the field of data mining. Rough Sets does not require any preliminary information as Fuzzy sets require membership values or probability is required in statistics. Hence this is its specialty. Two novel algorithms to find optimal reducts of condition attributes based on the relative attribute dependency are implemented using Java 1.5, out of which the first algorithms gives simple reduct whereas the second one gives the reduct with minimum attributes, The presented implementation serves as a prototype system for extracting decision rules, which is the first module. Second module gives positive regions for dependencies. Third module is reducts for calculating the minimum attributes to decide decision, with two techniques, first with brute force backward elimination which simply selects the attributes in the given order to check if they should be eliminated, and the second technique is the information entropy-based algorithm which calculates the information entropy conveyed in each attribute and selects the one with the maximum information gain for elimination. Fourth modules describes the Equivalence classes for Classification including lower and upper approximation for implementing hard computing and soft computing respectively and last module is the discernibility matrix and functions which is used that stores the differences between attribute values for each pair of data tuples. Rather than searching on the entire training set, the matrix is instead searched to detect redundant attributes. All these ultimately constitute the modules of the system. The implemented system is tested on a small sized application first to verity the mathematical calculations involved which is not practically feasible with large database. It is also tested on a medium sized application example to illustrate the usefulness of the system and the incorporated language.","PeriodicalId":269929,"journal":{"name":"2008 First International Conference on Emerging Trends in Engineering and Technology","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Applications of Rough Sets in the Field of Data Mining\",\"authors\":\"A. Butalia, M. Dhore, Geetika Tewani\",\"doi\":\"10.1109/ICETET.2008.143\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The issues of Real World are Very large data sets, Mixed types of data (continuous valued, symbolic data), Uncertainty (noisy data), Incompleteness (missing, incomplete data), Data change, Use of background knowledge etc. The main goal of the rough set analysis is induction of approximations of concepts [4]. Rough sets constitute a sound basis for KDD. It offers mathematical tools to discover patterns hidden in data [4] and hence used in the field of data mining. Rough Sets does not require any preliminary information as Fuzzy sets require membership values or probability is required in statistics. Hence this is its specialty. Two novel algorithms to find optimal reducts of condition attributes based on the relative attribute dependency are implemented using Java 1.5, out of which the first algorithms gives simple reduct whereas the second one gives the reduct with minimum attributes, The presented implementation serves as a prototype system for extracting decision rules, which is the first module. Second module gives positive regions for dependencies. Third module is reducts for calculating the minimum attributes to decide decision, with two techniques, first with brute force backward elimination which simply selects the attributes in the given order to check if they should be eliminated, and the second technique is the information entropy-based algorithm which calculates the information entropy conveyed in each attribute and selects the one with the maximum information gain for elimination. Fourth modules describes the Equivalence classes for Classification including lower and upper approximation for implementing hard computing and soft computing respectively and last module is the discernibility matrix and functions which is used that stores the differences between attribute values for each pair of data tuples. Rather than searching on the entire training set, the matrix is instead searched to detect redundant attributes. All these ultimately constitute the modules of the system. The implemented system is tested on a small sized application first to verity the mathematical calculations involved which is not practically feasible with large database. It is also tested on a medium sized application example to illustrate the usefulness of the system and the incorporated language.\",\"PeriodicalId\":269929,\"journal\":{\"name\":\"2008 First International Conference on Emerging Trends in Engineering and Technology\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 First International Conference on Emerging Trends in Engineering and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICETET.2008.143\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 First International Conference on Emerging Trends in Engineering and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICETET.2008.143","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Applications of Rough Sets in the Field of Data Mining
The issues of Real World are Very large data sets, Mixed types of data (continuous valued, symbolic data), Uncertainty (noisy data), Incompleteness (missing, incomplete data), Data change, Use of background knowledge etc. The main goal of the rough set analysis is induction of approximations of concepts [4]. Rough sets constitute a sound basis for KDD. It offers mathematical tools to discover patterns hidden in data [4] and hence used in the field of data mining. Rough Sets does not require any preliminary information as Fuzzy sets require membership values or probability is required in statistics. Hence this is its specialty. Two novel algorithms to find optimal reducts of condition attributes based on the relative attribute dependency are implemented using Java 1.5, out of which the first algorithms gives simple reduct whereas the second one gives the reduct with minimum attributes, The presented implementation serves as a prototype system for extracting decision rules, which is the first module. Second module gives positive regions for dependencies. Third module is reducts for calculating the minimum attributes to decide decision, with two techniques, first with brute force backward elimination which simply selects the attributes in the given order to check if they should be eliminated, and the second technique is the information entropy-based algorithm which calculates the information entropy conveyed in each attribute and selects the one with the maximum information gain for elimination. Fourth modules describes the Equivalence classes for Classification including lower and upper approximation for implementing hard computing and soft computing respectively and last module is the discernibility matrix and functions which is used that stores the differences between attribute values for each pair of data tuples. Rather than searching on the entire training set, the matrix is instead searched to detect redundant attributes. All these ultimately constitute the modules of the system. The implemented system is tested on a small sized application first to verity the mathematical calculations involved which is not practically feasible with large database. It is also tested on a medium sized application example to illustrate the usefulness of the system and the incorporated language.