{"title":"Decision tree construction for data mining on grid computing","authors":"Shun-Tzu Tsai, Chao-Tung Yang","doi":"10.1109/EEE.2004.1287344","DOIUrl":null,"url":null,"abstract":"Decision tree is one of the frequently used methods in data mining for searching prediction information. Due to its characteristics which are suitable for parallelism, it has been widely adopted in high performance field and developed into various parallel decision tree algorithms to deal with huge data and complex computation. Following the development of other technology fields, grid computing is regarded as the extension of PC cluster and therefore it future research development is highly valued. This new wave of Internet application is the 3rd generation of Internet applications following the traditional Internet and Web application. We have presented a grid-based decision tree architecture, and hope it can be applied on both parallel and sequential algorithms for the decision tree applications. Also, based on the scope and model of data mining applied in grid environment as well as user equivalent perspective, grid roles can be categorized into three types. We are hoping that through these definitions, software developers can define clear system processes and differentiate the application scope for software applications. To fulfil our architecture, we first apply an existing parallel decision tree algorithm-SPRINT algorithm in the grid environment. The performance and differences in many other areas are compared using different sizes of dataset. The experimental results are used for future reference and further development.","PeriodicalId":360167,"journal":{"name":"IEEE International Conference on e-Technology, e-Commerce and e-Service, 2004. EEE '04. 2004","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on e-Technology, e-Commerce and e-Service, 2004. EEE '04. 2004","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EEE.2004.1287344","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
Decision tree is one of the frequently used methods in data mining for searching prediction information. Due to its characteristics which are suitable for parallelism, it has been widely adopted in high performance field and developed into various parallel decision tree algorithms to deal with huge data and complex computation. Following the development of other technology fields, grid computing is regarded as the extension of PC cluster and therefore it future research development is highly valued. This new wave of Internet application is the 3rd generation of Internet applications following the traditional Internet and Web application. We have presented a grid-based decision tree architecture, and hope it can be applied on both parallel and sequential algorithms for the decision tree applications. Also, based on the scope and model of data mining applied in grid environment as well as user equivalent perspective, grid roles can be categorized into three types. We are hoping that through these definitions, software developers can define clear system processes and differentiate the application scope for software applications. To fulfil our architecture, we first apply an existing parallel decision tree algorithm-SPRINT algorithm in the grid environment. The performance and differences in many other areas are compared using different sizes of dataset. The experimental results are used for future reference and further development.