粗糙集在数据挖掘领域中的应用

A. Butalia, M. Dhore, Geetika Tewani
{"title":"粗糙集在数据挖掘领域中的应用","authors":"A. Butalia, M. Dhore, Geetika Tewani","doi":"10.1109/ICETET.2008.143","DOIUrl":null,"url":null,"abstract":"The issues of Real World are Very large data sets, Mixed types of data (continuous valued, symbolic data), Uncertainty (noisy data), Incompleteness (missing, incomplete data), Data change, Use of background knowledge etc. The main goal of the rough set analysis is induction of approximations of concepts [4]. Rough sets constitute a sound basis for KDD. It offers mathematical tools to discover patterns hidden in data [4] and hence used in the field of data mining. Rough Sets does not require any preliminary information as Fuzzy sets require membership values or probability is required in statistics. Hence this is its specialty. Two novel algorithms to find optimal reducts of condition attributes based on the relative attribute dependency are implemented using Java 1.5, out of which the first algorithms gives simple reduct whereas the second one gives the reduct with minimum attributes, The presented implementation serves as a prototype system for extracting decision rules, which is the first module. Second module gives positive regions for dependencies. Third module is reducts for calculating the minimum attributes to decide decision, with two techniques, first with brute force backward elimination which simply selects the attributes in the given order to check if they should be eliminated, and the second technique is the information entropy-based algorithm which calculates the information entropy conveyed in each attribute and selects the one with the maximum information gain for elimination. Fourth modules describes the Equivalence classes for Classification including lower and upper approximation for implementing hard computing and soft computing respectively and last module is the discernibility matrix and functions which is used that stores the differences between attribute values for each pair of data tuples. Rather than searching on the entire training set, the matrix is instead searched to detect redundant attributes. All these ultimately constitute the modules of the system. The implemented system is tested on a small sized application first to verity the mathematical calculations involved which is not practically feasible with large database. It is also tested on a medium sized application example to illustrate the usefulness of the system and the incorporated language.","PeriodicalId":269929,"journal":{"name":"2008 First International Conference on Emerging Trends in Engineering and Technology","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Applications of Rough Sets in the Field of Data Mining\",\"authors\":\"A. Butalia, M. Dhore, Geetika Tewani\",\"doi\":\"10.1109/ICETET.2008.143\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The issues of Real World are Very large data sets, Mixed types of data (continuous valued, symbolic data), Uncertainty (noisy data), Incompleteness (missing, incomplete data), Data change, Use of background knowledge etc. The main goal of the rough set analysis is induction of approximations of concepts [4]. Rough sets constitute a sound basis for KDD. It offers mathematical tools to discover patterns hidden in data [4] and hence used in the field of data mining. Rough Sets does not require any preliminary information as Fuzzy sets require membership values or probability is required in statistics. Hence this is its specialty. Two novel algorithms to find optimal reducts of condition attributes based on the relative attribute dependency are implemented using Java 1.5, out of which the first algorithms gives simple reduct whereas the second one gives the reduct with minimum attributes, The presented implementation serves as a prototype system for extracting decision rules, which is the first module. Second module gives positive regions for dependencies. Third module is reducts for calculating the minimum attributes to decide decision, with two techniques, first with brute force backward elimination which simply selects the attributes in the given order to check if they should be eliminated, and the second technique is the information entropy-based algorithm which calculates the information entropy conveyed in each attribute and selects the one with the maximum information gain for elimination. Fourth modules describes the Equivalence classes for Classification including lower and upper approximation for implementing hard computing and soft computing respectively and last module is the discernibility matrix and functions which is used that stores the differences between attribute values for each pair of data tuples. Rather than searching on the entire training set, the matrix is instead searched to detect redundant attributes. All these ultimately constitute the modules of the system. The implemented system is tested on a small sized application first to verity the mathematical calculations involved which is not practically feasible with large database. It is also tested on a medium sized application example to illustrate the usefulness of the system and the incorporated language.\",\"PeriodicalId\":269929,\"journal\":{\"name\":\"2008 First International Conference on Emerging Trends in Engineering and Technology\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 First International Conference on Emerging Trends in Engineering and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICETET.2008.143\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 First International Conference on Emerging Trends in Engineering and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICETET.2008.143","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

摘要

现实世界的问题是非常大的数据集,混合类型的数据(连续值,符号数据),不确定性(噪声数据),不完整性(缺失,不完整的数据),数据更改,背景知识的使用等。粗糙集分析的主要目标是对概念的近似进行归纳[4]。粗糙集构成了KDD的良好基础。它提供了数学工具来发现隐藏在数据中的模式[4],因此用于数据挖掘领域。粗糙集不需要任何初步信息,模糊集需要隶属度值或统计中需要概率。因此这是它的特色。利用Java 1.5实现了两种基于相对属性依赖的条件属性最优约简算法,其中第一种算法为简单约简,第二种算法为最小属性约简,该算法作为决策规则提取的原型系统,是第一个模块。第二个模块给出了依赖项的正区域。第三个模块是约简,用于计算最小属性来决定决策,采用两种技术,一种是简单地按给定的顺序选择属性来检查是否需要消除,另一种是基于信息熵的算法,计算每个属性所传递的信息熵,并选择信息增益最大的属性进行消除。第四个模块分别描述了用于分类的等价类,包括用于实现硬计算和软计算的下近似和上近似;最后一个模块是用于存储每对数据元组属性值之间差异的可辨矩阵和函数。不是在整个训练集上搜索,而是搜索矩阵以检测冗余属性。所有这些最终构成了系统的模块。所实现的系统首先在一个小型应用程序上进行测试,以验证所涉及的数学计算在大型数据库中是不可行的。它还在一个中等规模的应用程序示例中进行了测试,以说明系统和合并语言的有用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Applications of Rough Sets in the Field of Data Mining
The issues of Real World are Very large data sets, Mixed types of data (continuous valued, symbolic data), Uncertainty (noisy data), Incompleteness (missing, incomplete data), Data change, Use of background knowledge etc. The main goal of the rough set analysis is induction of approximations of concepts [4]. Rough sets constitute a sound basis for KDD. It offers mathematical tools to discover patterns hidden in data [4] and hence used in the field of data mining. Rough Sets does not require any preliminary information as Fuzzy sets require membership values or probability is required in statistics. Hence this is its specialty. Two novel algorithms to find optimal reducts of condition attributes based on the relative attribute dependency are implemented using Java 1.5, out of which the first algorithms gives simple reduct whereas the second one gives the reduct with minimum attributes, The presented implementation serves as a prototype system for extracting decision rules, which is the first module. Second module gives positive regions for dependencies. Third module is reducts for calculating the minimum attributes to decide decision, with two techniques, first with brute force backward elimination which simply selects the attributes in the given order to check if they should be eliminated, and the second technique is the information entropy-based algorithm which calculates the information entropy conveyed in each attribute and selects the one with the maximum information gain for elimination. Fourth modules describes the Equivalence classes for Classification including lower and upper approximation for implementing hard computing and soft computing respectively and last module is the discernibility matrix and functions which is used that stores the differences between attribute values for each pair of data tuples. Rather than searching on the entire training set, the matrix is instead searched to detect redundant attributes. All these ultimately constitute the modules of the system. The implemented system is tested on a small sized application first to verity the mathematical calculations involved which is not practically feasible with large database. It is also tested on a medium sized application example to illustrate the usefulness of the system and the incorporated language.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信