{"title":"分层数据建模:统计、基于树和神经网络方法的系统比较","authors":"Marzieh Amiri Shahbazi, Nasibeh Azadeh-Fard","doi":"10.1016/j.mlwa.2025.100688","DOIUrl":null,"url":null,"abstract":"<div><div>Hierarchical modeling approaches have evolved significantly, yet comprehensive comparisons between fundamentally different methodological paradigms remain limited. This research presents a systematic comparative analysis of three distinct hierarchical modeling approaches: statistical (Hierarchical Mixed Model), tree-based (Hierarchical Random Forest), and neural (Hierarchical Neural Network). Based on the 2019 National Inpatient Sample — comprising more than seven million records from 4568 hospitals across four U.S. regions — the models were assessed for their ability to predict length of stay at the patient, hospital, and regional levels. The evaluation framework integrated quantitative metrics and qualitative factors, including analyses across varying sample sizes, simplified hierarchies, and a separate intensive-care dataset. Results demonstrate that tree-based approaches consistently outperform alternatives in predictive accuracy and explanation of variance while maintaining computational efficiency. These performance patterns remain generally consistent across sample sizes, simplified hierarchies, and the external dataset. Neural approaches excel at capturing group-level distinctions but require substantial computational resources and exhibit prediction bias. Statistical approaches offer rapid inference and interpretability but underperform in accuracy at intermediate hierarchical levels. Each model exhibits distinctive hierarchical information processing: neural models favor bottom-up flow, statistical models emphasize top-down constraints, and tree-based models achieve balanced integration. This research establishes practical guidelines for selecting appropriate hierarchical modeling approaches based on data characteristics, computational constraints, and analytical requirements, thereby advancing understanding of fundamental trade-offs in multilevel analysis.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"21 ","pages":"Article 100688"},"PeriodicalIF":4.9000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hierarchical data modeling: A systematic comparison of statistical, tree-based, and neural network approaches\",\"authors\":\"Marzieh Amiri Shahbazi, Nasibeh Azadeh-Fard\",\"doi\":\"10.1016/j.mlwa.2025.100688\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Hierarchical modeling approaches have evolved significantly, yet comprehensive comparisons between fundamentally different methodological paradigms remain limited. This research presents a systematic comparative analysis of three distinct hierarchical modeling approaches: statistical (Hierarchical Mixed Model), tree-based (Hierarchical Random Forest), and neural (Hierarchical Neural Network). Based on the 2019 National Inpatient Sample — comprising more than seven million records from 4568 hospitals across four U.S. regions — the models were assessed for their ability to predict length of stay at the patient, hospital, and regional levels. The evaluation framework integrated quantitative metrics and qualitative factors, including analyses across varying sample sizes, simplified hierarchies, and a separate intensive-care dataset. Results demonstrate that tree-based approaches consistently outperform alternatives in predictive accuracy and explanation of variance while maintaining computational efficiency. These performance patterns remain generally consistent across sample sizes, simplified hierarchies, and the external dataset. Neural approaches excel at capturing group-level distinctions but require substantial computational resources and exhibit prediction bias. Statistical approaches offer rapid inference and interpretability but underperform in accuracy at intermediate hierarchical levels. Each model exhibits distinctive hierarchical information processing: neural models favor bottom-up flow, statistical models emphasize top-down constraints, and tree-based models achieve balanced integration. This research establishes practical guidelines for selecting appropriate hierarchical modeling approaches based on data characteristics, computational constraints, and analytical requirements, thereby advancing understanding of fundamental trade-offs in multilevel analysis.</div></div>\",\"PeriodicalId\":74093,\"journal\":{\"name\":\"Machine learning with applications\",\"volume\":\"21 \",\"pages\":\"Article 100688\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine learning with applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666827025000714\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827025000714","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hierarchical data modeling: A systematic comparison of statistical, tree-based, and neural network approaches
Hierarchical modeling approaches have evolved significantly, yet comprehensive comparisons between fundamentally different methodological paradigms remain limited. This research presents a systematic comparative analysis of three distinct hierarchical modeling approaches: statistical (Hierarchical Mixed Model), tree-based (Hierarchical Random Forest), and neural (Hierarchical Neural Network). Based on the 2019 National Inpatient Sample — comprising more than seven million records from 4568 hospitals across four U.S. regions — the models were assessed for their ability to predict length of stay at the patient, hospital, and regional levels. The evaluation framework integrated quantitative metrics and qualitative factors, including analyses across varying sample sizes, simplified hierarchies, and a separate intensive-care dataset. Results demonstrate that tree-based approaches consistently outperform alternatives in predictive accuracy and explanation of variance while maintaining computational efficiency. These performance patterns remain generally consistent across sample sizes, simplified hierarchies, and the external dataset. Neural approaches excel at capturing group-level distinctions but require substantial computational resources and exhibit prediction bias. Statistical approaches offer rapid inference and interpretability but underperform in accuracy at intermediate hierarchical levels. Each model exhibits distinctive hierarchical information processing: neural models favor bottom-up flow, statistical models emphasize top-down constraints, and tree-based models achieve balanced integration. This research establishes practical guidelines for selecting appropriate hierarchical modeling approaches based on data characteristics, computational constraints, and analytical requirements, thereby advancing understanding of fundamental trade-offs in multilevel analysis.