An online software for decision tree classification and visualization using c4.5 algorithm (ODTC)

Suvajit Das, Shashi Dahiya, Anshu Bharadwaj
{"title":"An online software for decision tree classification and visualization using c4.5 algorithm (ODTC)","authors":"Suvajit Das, Shashi Dahiya, Anshu Bharadwaj","doi":"10.1109/INDIACOM.2014.6828107","DOIUrl":null,"url":null,"abstract":"Classification is an important and widely carried out task of data mining. It is a predictive modelling task which is defined as building a model for the target variable as a function of the explanatory variables. There are many well established techniques for classification, while decision tree is a very important and popular technique from the machine learning domain. Decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs and utility. C4.5 is a well known decision tree algorithm used for classifying datasets. The C4.5 algorithm is Quintan's extension of his own ID3 algorithm for decision tree classification. It induces decision trees and generates rules from datasets, which could contain categorical and/or numerical attributes. The rules could be used to predict categorical values of attributes from new records. C4.5 performs well in classifying the dataset as well as in generating useful rules. In this paper, a web based software for rule generation and decision tree induction using C4.5 algorithm has been discussed. The visualization in the form of tree structure enhances the understanding of the generated rules. The software contains the feature to impute the missing values in data. The input data can both be categorical and numerical in nature. The software can import TXT, XLS and CSV data file formats. Enhanced waterfall model has been used for the software development process. This software will be useful for academicians, researchers and students working in the area of data mining, agriculture and other fields where huge amount of data is generated.","PeriodicalId":404873,"journal":{"name":"2014 International Conference on Computing for Sustainable Global Development (INDIACom)","volume":"140 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Computing for Sustainable Global Development (INDIACom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDIACOM.2014.6828107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Classification is an important and widely carried out task of data mining. It is a predictive modelling task which is defined as building a model for the target variable as a function of the explanatory variables. There are many well established techniques for classification, while decision tree is a very important and popular technique from the machine learning domain. Decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs and utility. C4.5 is a well known decision tree algorithm used for classifying datasets. The C4.5 algorithm is Quintan's extension of his own ID3 algorithm for decision tree classification. It induces decision trees and generates rules from datasets, which could contain categorical and/or numerical attributes. The rules could be used to predict categorical values of attributes from new records. C4.5 performs well in classifying the dataset as well as in generating useful rules. In this paper, a web based software for rule generation and decision tree induction using C4.5 algorithm has been discussed. The visualization in the form of tree structure enhances the understanding of the generated rules. The software contains the feature to impute the missing values in data. The input data can both be categorical and numerical in nature. The software can import TXT, XLS and CSV data file formats. Enhanced waterfall model has been used for the software development process. This software will be useful for academicians, researchers and students working in the area of data mining, agriculture and other fields where huge amount of data is generated.
基于c4.5算法的在线决策树分类可视化软件(ODTC)
分类是数据挖掘中一项重要且广泛开展的任务。这是一项预测建模任务,其定义为将目标变量作为解释变量的函数建立模型。有许多成熟的分类技术,而决策树是机器学习领域中非常重要和流行的技术。决策树是一种决策支持工具,它使用树形图或决策模型及其可能的结果,包括偶然事件结果、资源成本和效用。C4.5是一种众所周知的用于数据集分类的决策树算法。C4.5算法是Quintan对自己的ID3决策树分类算法的扩展。它诱导决策树并从数据集生成规则,这些数据集可能包含分类和/或数值属性。这些规则可用于从新记录中预测属性的分类值。C4.5在对数据集进行分类以及生成有用规则方面表现良好。本文讨论了一种基于web的基于C4.5算法的规则生成和决策树归纳软件。树状结构的可视化增强了对生成规则的理解。该软件具有对数据中缺失值进行补全的功能。输入数据在本质上既可以是分类数据,也可以是数值数据。本软件支持导入TXT、XLS和CSV三种格式的数据文件。在软件开发过程中采用了增强的瀑布模型。该软件将对在数据挖掘、农业和其他产生大量数据的领域工作的学者、研究人员和学生有用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信