Stacked Generalization of Random Forest and Decision Tree Techniques for Library Data Visualization

International Journal of Engineering and Applied Computer Science Pub Date : 2022-05-20 DOI:10.24032/ijeacs/0404/005

Stanley Ziweritin

{"title":"Stacked Generalization of Random Forest and Decision Tree Techniques for Library Data Visualization","authors":"Stanley Ziweritin","doi":"10.24032/ijeacs/0404/005","DOIUrl":null,"url":null,"abstract":"The huge amount of library data stored in our modern research and statistic centers of organizations is springing up on daily bases. These databases grow exponentially in size with respect to time, it becomes exceptionally difficult to easily understand the behavior and interpret data with the relationships that exist between attributes. This exponential growth of data poses new organizational challenges like the conventional record management system infrastructure could no longer cope to give precise and detailed information about the behavior data over time. There is confusion and novel concern in selecting tools that can support and handle big data visualization that deals with multi-dimension. Viewing all related data at once in a database is a problem that has attracted the interest of data professionals with machine learning skills. This is a lingering issue in the data industry because the existing techniques cannot be used to remove or filter noise from relevant data and pad up missing values in order to get the required information. The aim is to develop a stacked generalization model that combines the functionality of random forest and decision tree to visualization library database visualization. In this paper, the random forest and decision tree techniques were employed to effectively visualize large amounts of school library data. The proposed system was implemented with a few lines of Python code to create visualizations that can help users at a glance understand and interpret the behavior of data and its relationships. The model was trained and tested to learn and extract hidden patterns of data with a cross-validation test. It combined the functionalities of both models to form a stacked generalization model that performed better than the individual techniques. The stacked model produced 95% followed by the RF which produced a 95% accuracy rate and 0.223600 RMSE error value in comparison with the DT which recorded an 80.00% success rate and 0.15990 RMSE value.","PeriodicalId":423763,"journal":{"name":"International Journal of Engineering and Applied Computer Science","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Engineering and Applied Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24032/ijeacs/0404/005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The huge amount of library data stored in our modern research and statistic centers of organizations is springing up on daily bases. These databases grow exponentially in size with respect to time, it becomes exceptionally difficult to easily understand the behavior and interpret data with the relationships that exist between attributes. This exponential growth of data poses new organizational challenges like the conventional record management system infrastructure could no longer cope to give precise and detailed information about the behavior data over time. There is confusion and novel concern in selecting tools that can support and handle big data visualization that deals with multi-dimension. Viewing all related data at once in a database is a problem that has attracted the interest of data professionals with machine learning skills. This is a lingering issue in the data industry because the existing techniques cannot be used to remove or filter noise from relevant data and pad up missing values in order to get the required information. The aim is to develop a stacked generalization model that combines the functionality of random forest and decision tree to visualization library database visualization. In this paper, the random forest and decision tree techniques were employed to effectively visualize large amounts of school library data. The proposed system was implemented with a few lines of Python code to create visualizations that can help users at a glance understand and interpret the behavior of data and its relationships. The model was trained and tested to learn and extract hidden patterns of data with a cross-validation test. It combined the functionalities of both models to form a stacked generalization model that performed better than the individual techniques. The stacked model produced 95% followed by the RF which produced a 95% accuracy rate and 0.223600 RMSE error value in comparison with the DT which recorded an 80.00% success rate and 0.15990 RMSE value.

查看原文本刊更多论文

图书馆数据可视化中随机森林和决策树技术的叠加泛化

存储在现代组织研究和统计中心的海量图书馆数据每天如雨后春笋般涌现。这些数据库的大小随时间呈指数级增长，因此很难轻松地理解行为，并利用属性之间存在的关系解释数据。这种数据的指数级增长给组织带来了新的挑战，比如传统的记录管理系统基础设施不再能够提供关于行为数据的精确和详细的信息。在选择支持和处理多维大数据可视化的工具时，存在一些困惑和新问题。在数据库中一次查看所有相关数据是一个问题，它吸引了具有机器学习技能的数据专业人员的兴趣。这是数据行业中一个挥之不去的问题，因为现有的技术不能用来从相关数据中去除或过滤噪声，也不能填充缺失的值，以获得所需的信息。目的是开发一种结合随机森林和决策树功能的层叠泛化模型，实现可视化库数据库的可视化。本文采用随机森林和决策树技术对大量学校图书馆数据进行有效的可视化处理。提出的系统是用几行Python代码实现的，以创建可视化，可以帮助用户一目了然地理解和解释数据的行为及其关系。对模型进行训练和测试，通过交叉验证测试来学习和提取数据的隐藏模式。它结合了两种模型的功能，形成了一个比单独技术执行得更好的堆叠泛化模型。与DT相比，堆叠模型产生了95%的准确率和0.223600 RMSE误差值，而DT则记录了80.00%的成功率和0.15990 RMSE值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Engineering and Applied Computer Science

自引率

0.00%

发文量