分层数据建模：统计、基于树和神经网络方法的系统比较

IF 4.9

Machine learning with applications Pub Date : 2025-06-24 DOI:10.1016/j.mlwa.2025.100688

Marzieh Amiri Shahbazi, Nasibeh Azadeh-Fard

{"title":"分层数据建模：统计、基于树和神经网络方法的系统比较","authors":"Marzieh Amiri Shahbazi, Nasibeh Azadeh-Fard","doi":"10.1016/j.mlwa.2025.100688","DOIUrl":null,"url":null,"abstract":"<div><div>Hierarchical modeling approaches have evolved significantly, yet comprehensive comparisons between fundamentally different methodological paradigms remain limited. This research presents a systematic comparative analysis of three distinct hierarchical modeling approaches: statistical (Hierarchical Mixed Model), tree-based (Hierarchical Random Forest), and neural (Hierarchical Neural Network). Based on the 2019 National Inpatient Sample — comprising more than seven million records from 4568 hospitals across four U.S. regions — the models were assessed for their ability to predict length of stay at the patient, hospital, and regional levels. The evaluation framework integrated quantitative metrics and qualitative factors, including analyses across varying sample sizes, simplified hierarchies, and a separate intensive-care dataset. Results demonstrate that tree-based approaches consistently outperform alternatives in predictive accuracy and explanation of variance while maintaining computational efficiency. These performance patterns remain generally consistent across sample sizes, simplified hierarchies, and the external dataset. Neural approaches excel at capturing group-level distinctions but require substantial computational resources and exhibit prediction bias. Statistical approaches offer rapid inference and interpretability but underperform in accuracy at intermediate hierarchical levels. Each model exhibits distinctive hierarchical information processing: neural models favor bottom-up flow, statistical models emphasize top-down constraints, and tree-based models achieve balanced integration. This research establishes practical guidelines for selecting appropriate hierarchical modeling approaches based on data characteristics, computational constraints, and analytical requirements, thereby advancing understanding of fundamental trade-offs in multilevel analysis.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"21 ","pages":"Article 100688"},"PeriodicalIF":4.9000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hierarchical data modeling: A systematic comparison of statistical, tree-based, and neural network approaches\",\"authors\":\"Marzieh Amiri Shahbazi, Nasibeh Azadeh-Fard\",\"doi\":\"10.1016/j.mlwa.2025.100688\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Hierarchical modeling approaches have evolved significantly, yet comprehensive comparisons between fundamentally different methodological paradigms remain limited. This research presents a systematic comparative analysis of three distinct hierarchical modeling approaches: statistical (Hierarchical Mixed Model), tree-based (Hierarchical Random Forest), and neural (Hierarchical Neural Network). Based on the 2019 National Inpatient Sample — comprising more than seven million records from 4568 hospitals across four U.S. regions — the models were assessed for their ability to predict length of stay at the patient, hospital, and regional levels. The evaluation framework integrated quantitative metrics and qualitative factors, including analyses across varying sample sizes, simplified hierarchies, and a separate intensive-care dataset. Results demonstrate that tree-based approaches consistently outperform alternatives in predictive accuracy and explanation of variance while maintaining computational efficiency. These performance patterns remain generally consistent across sample sizes, simplified hierarchies, and the external dataset. Neural approaches excel at capturing group-level distinctions but require substantial computational resources and exhibit prediction bias. Statistical approaches offer rapid inference and interpretability but underperform in accuracy at intermediate hierarchical levels. Each model exhibits distinctive hierarchical information processing: neural models favor bottom-up flow, statistical models emphasize top-down constraints, and tree-based models achieve balanced integration. This research establishes practical guidelines for selecting appropriate hierarchical modeling approaches based on data characteristics, computational constraints, and analytical requirements, thereby advancing understanding of fundamental trade-offs in multilevel analysis.</div></div>\",\"PeriodicalId\":74093,\"journal\":{\"name\":\"Machine learning with applications\",\"volume\":\"21 \",\"pages\":\"Article 100688\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine learning with applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666827025000714\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827025000714","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

分层建模方法已经有了显著的发展，但在根本不同的方法范例之间的全面比较仍然有限。本研究对三种不同的分层建模方法进行了系统的比较分析：统计（分层混合模型）、基于树的（分层随机森林）和神经（分层神经网络）。根据2019年全国住院患者样本（包括来自美国四个地区4568家医院的700多万份记录），对这些模型进行了评估，以评估它们预测患者、医院和地区三级住院时间的能力。评估框架整合了定量指标和定性因素，包括对不同样本量的分析、简化的层次结构和单独的重症监护数据集。结果表明，基于树的方法在保持计算效率的同时，在预测准确性和方差解释方面始终优于其他方法。这些性能模式在样本大小、简化的层次结构和外部数据集之间通常保持一致。神经方法擅长捕捉群体水平的差异，但需要大量的计算资源，并表现出预测偏差。统计方法提供了快速的推理和可解释性，但在中间层次水平上的准确性表现不佳。每个模型都表现出独特的分层信息处理：神经模型倾向于自下而上的流，统计模型强调自上而下的约束，而基于树的模型实现平衡集成。本研究建立了基于数据特征、计算约束和分析需求选择适当的分层建模方法的实用指南，从而促进了对多层次分析中基本权衡的理解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Hierarchical data modeling: A systematic comparison of statistical, tree-based, and neural network approaches

Hierarchical modeling approaches have evolved significantly, yet comprehensive comparisons between fundamentally different methodological paradigms remain limited. This research presents a systematic comparative analysis of three distinct hierarchical modeling approaches: statistical (Hierarchical Mixed Model), tree-based (Hierarchical Random Forest), and neural (Hierarchical Neural Network). Based on the 2019 National Inpatient Sample — comprising more than seven million records from 4568 hospitals across four U.S. regions — the models were assessed for their ability to predict length of stay at the patient, hospital, and regional levels. The evaluation framework integrated quantitative metrics and qualitative factors, including analyses across varying sample sizes, simplified hierarchies, and a separate intensive-care dataset. Results demonstrate that tree-based approaches consistently outperform alternatives in predictive accuracy and explanation of variance while maintaining computational efficiency. These performance patterns remain generally consistent across sample sizes, simplified hierarchies, and the external dataset. Neural approaches excel at capturing group-level distinctions but require substantial computational resources and exhibit prediction bias. Statistical approaches offer rapid inference and interpretability but underperform in accuracy at intermediate hierarchical levels. Each model exhibits distinctive hierarchical information processing: neural models favor bottom-up flow, statistical models emphasize top-down constraints, and tree-based models achieve balanced integration. This research establishes practical guidelines for selecting appropriate hierarchical modeling approaches based on data characteristics, computational constraints, and analytical requirements, thereby advancing understanding of fundamental trade-offs in multilevel analysis.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Machine learning with applications Management Science and Operations Research, Artificial Intelligence, Computer Science Applications

自引率

0.00%

发文量

审稿时长

98 days