Combining Continuous and Categorical Data Modeling in Developmental Age Estimation Using Hierarchical Bayes

Valerie Sgheiza
{"title":"Combining Continuous and Categorical Data Modeling in Developmental Age Estimation Using Hierarchical Bayes","authors":"Valerie Sgheiza","doi":"10.5744/fa.2022.0016","DOIUrl":null,"url":null,"abstract":"Residual correlations (correlations that persist after accounting for the effect of chronological age) between variables can have a significant impact on final age estimates. Such correlations can result in overly narrow age intervals and high error rates when not accounted for. Modeling correlations can be mathematically problematic across mixed data types. Hierarchical modeling can incorporate continuous and categorical traits into a single model that accounts for correlated variables while reducing computationally expensive calculations. This paper demonstrates a Bayesian hierarchical modeling approach in which trait variables were grouped by data type or bodily system and used to produce separate age estimates with any appropriate model. These age estimates were combined into a single estimate using a multivariate normal model via nested cross-validation. The data used included nine diaphyseal length measurements and 29 epiphyseal fusion and ossification sites from 179 individuals in the publicly available U.S. Subadult Virtual Anthropology Database. Diaphyseal ages were modeled with linear regression and epiphyseal ages with random forest regression. Age estimates from the hierarchical model had reduced bias relative to diaphyseal or epiphyseal maximum likelihood estimates alone. Combined-indicator age intervals from 95% highest density regions (HDRs) were on average 15% narrower than those from diaphyseal 95% HDRs while success rates were 2% lower (91% vs. 93%). Functional example code is provided. A general hierarchical modeling approach may be applicable to other areas of skeletal analysis that employ correlated variables of mixed data types including adult age estimation and ancestry estimation.","PeriodicalId":309775,"journal":{"name":"Forensic Anthropology","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Anthropology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5744/fa.2022.0016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Residual correlations (correlations that persist after accounting for the effect of chronological age) between variables can have a significant impact on final age estimates. Such correlations can result in overly narrow age intervals and high error rates when not accounted for. Modeling correlations can be mathematically problematic across mixed data types. Hierarchical modeling can incorporate continuous and categorical traits into a single model that accounts for correlated variables while reducing computationally expensive calculations. This paper demonstrates a Bayesian hierarchical modeling approach in which trait variables were grouped by data type or bodily system and used to produce separate age estimates with any appropriate model. These age estimates were combined into a single estimate using a multivariate normal model via nested cross-validation. The data used included nine diaphyseal length measurements and 29 epiphyseal fusion and ossification sites from 179 individuals in the publicly available U.S. Subadult Virtual Anthropology Database. Diaphyseal ages were modeled with linear regression and epiphyseal ages with random forest regression. Age estimates from the hierarchical model had reduced bias relative to diaphyseal or epiphyseal maximum likelihood estimates alone. Combined-indicator age intervals from 95% highest density regions (HDRs) were on average 15% narrower than those from diaphyseal 95% HDRs while success rates were 2% lower (91% vs. 93%). Functional example code is provided. A general hierarchical modeling approach may be applicable to other areas of skeletal analysis that employ correlated variables of mixed data types including adult age estimation and ancestry estimation.
结合连续和分类数据模型的分层贝叶斯发育年龄估计
变量之间的残差相关性(在考虑了实足年龄的影响后仍然存在的相关性)可能对最终的年龄估计值产生重大影响。如果不加以考虑,这种相关性可能导致年龄间隔过窄和错误率高。在混合数据类型之间建模相关性在数学上可能存在问题。分层建模可以将连续和分类特征合并到一个单一的模型中,该模型可以解释相关变量,同时减少计算成本。本文演示了贝叶斯分层建模方法,其中特征变量按数据类型或身体系统分组,并使用任何适当的模型产生单独的年龄估计。通过嵌套交叉验证,使用多变量正态模型将这些年龄估计合并为单个估计。使用的数据包括来自179个人的9个骨干长度测量值和29个骨骺融合和骨化位点,这些数据来自美国亚成人虚拟人类学数据库。干骺年龄采用线性回归模型,骨骺年龄采用随机森林回归模型。相对于单独的骨干或骨骺最大似然估计,分层模型的年龄估计减少了偏倚。95%最高密度区(hdr)的综合指标年龄间隔平均比干骺端95%最高密度区窄15%,成功率低2%(91%对93%)。提供了函数示例代码。一般的分层建模方法可能适用于骨骼分析的其他领域,这些领域使用混合数据类型的相关变量,包括成人年龄估计和祖先估计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信