{"title":"让数据分类更有效自动深度森林模型","authors":"Jingwei Guo , Xiang Guo , Yihui Tian , Hao Zhan , Zhen-Song Chen , Muhammet Deveci","doi":"10.1016/j.jii.2024.100738","DOIUrl":null,"url":null,"abstract":"<div><div>Despite a small overfitting risk, the deep forest model and its variants cannot automatically match data features; they rely on manual experience and comparative experiments for forest learner selection. This study proposes an automated deep forest model (ATDF) to enhance deep forest automation by automatically determining forest learners’ types and numbers based on training data. The model introduces a forest learner variability measure based on normalized mutual information, serving as a theoretical foundation for the automated process in deep forests. Then, a novel hierarchical clustering algorithm based on normalized mutual information is proposed to group forest learners at different granularities, determining the optimal forest learner type. This advanced technical method enables the determination of the model structure for stacking models, including deep forests. Finally, with the goal of maximizing cross-validation scores, the tree parson estimator-based Bayesian optimization algorithm determines the ideal number of forest learners for each type. Additionally, a standardized method for identifying forest learners is developed to guarantee the consistency of model outcomes. Most importantly, a series of comparative experiments on seven datasets from the UCI Machine Learning Repository confirmed the effectiveness and superiority of the proposed model. The results demonstrate that the proposed model has superior adaptability to new data and tasks, besides having a high level of automation, and performs excellently in the classification task.</div></div>","PeriodicalId":55975,"journal":{"name":"Journal of Industrial Information Integration","volume":"42 ","pages":"Article 100738"},"PeriodicalIF":10.4000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Making data classification more effective: An automated deep forest model\",\"authors\":\"Jingwei Guo , Xiang Guo , Yihui Tian , Hao Zhan , Zhen-Song Chen , Muhammet Deveci\",\"doi\":\"10.1016/j.jii.2024.100738\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Despite a small overfitting risk, the deep forest model and its variants cannot automatically match data features; they rely on manual experience and comparative experiments for forest learner selection. This study proposes an automated deep forest model (ATDF) to enhance deep forest automation by automatically determining forest learners’ types and numbers based on training data. The model introduces a forest learner variability measure based on normalized mutual information, serving as a theoretical foundation for the automated process in deep forests. Then, a novel hierarchical clustering algorithm based on normalized mutual information is proposed to group forest learners at different granularities, determining the optimal forest learner type. This advanced technical method enables the determination of the model structure for stacking models, including deep forests. Finally, with the goal of maximizing cross-validation scores, the tree parson estimator-based Bayesian optimization algorithm determines the ideal number of forest learners for each type. Additionally, a standardized method for identifying forest learners is developed to guarantee the consistency of model outcomes. Most importantly, a series of comparative experiments on seven datasets from the UCI Machine Learning Repository confirmed the effectiveness and superiority of the proposed model. The results demonstrate that the proposed model has superior adaptability to new data and tasks, besides having a high level of automation, and performs excellently in the classification task.</div></div>\",\"PeriodicalId\":55975,\"journal\":{\"name\":\"Journal of Industrial Information Integration\",\"volume\":\"42 \",\"pages\":\"Article 100738\"},\"PeriodicalIF\":10.4000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Industrial Information Integration\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2452414X2400181X\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Industrial Information Integration","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2452414X2400181X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Making data classification more effective: An automated deep forest model
Despite a small overfitting risk, the deep forest model and its variants cannot automatically match data features; they rely on manual experience and comparative experiments for forest learner selection. This study proposes an automated deep forest model (ATDF) to enhance deep forest automation by automatically determining forest learners’ types and numbers based on training data. The model introduces a forest learner variability measure based on normalized mutual information, serving as a theoretical foundation for the automated process in deep forests. Then, a novel hierarchical clustering algorithm based on normalized mutual information is proposed to group forest learners at different granularities, determining the optimal forest learner type. This advanced technical method enables the determination of the model structure for stacking models, including deep forests. Finally, with the goal of maximizing cross-validation scores, the tree parson estimator-based Bayesian optimization algorithm determines the ideal number of forest learners for each type. Additionally, a standardized method for identifying forest learners is developed to guarantee the consistency of model outcomes. Most importantly, a series of comparative experiments on seven datasets from the UCI Machine Learning Repository confirmed the effectiveness and superiority of the proposed model. The results demonstrate that the proposed model has superior adaptability to new data and tasks, besides having a high level of automation, and performs excellently in the classification task.
期刊介绍:
The Journal of Industrial Information Integration focuses on the industry's transition towards industrial integration and informatization, covering not only hardware and software but also information integration. It serves as a platform for promoting advances in industrial information integration, addressing challenges, issues, and solutions in an interdisciplinary forum for researchers, practitioners, and policy makers.
The Journal of Industrial Information Integration welcomes papers on foundational, technical, and practical aspects of industrial information integration, emphasizing the complex and cross-disciplinary topics that arise in industrial integration. Techniques from mathematical science, computer science, computer engineering, electrical and electronic engineering, manufacturing engineering, and engineering management are crucial in this context.