Information-theoretic evaluation of covariate distributions models.

IF 2.2 4区 医学 Q3 PHARMACOLOGY & PHARMACY
Niklas Hartung, Aleksandra Khatova
{"title":"Information-theoretic evaluation of covariate distributions models.","authors":"Niklas Hartung, Aleksandra Khatova","doi":"10.1007/s10928-025-09968-5","DOIUrl":null,"url":null,"abstract":"<p><p>Statistical modelling of covariate distributions allows to generate virtual populations or to impute missing values in a covariate dataset. Covariate distributions typically have non-Gaussian margins and show nonlinear correlation structures, which simple descriptions like multivariate Gaussian distributions fail to represent. Prominent non-Gaussian frameworks for covariate distribution modelling are copula-based models and models based on multiple imputation by chained equations (MICE). While both frameworks have already found applications in the life sciences, a systematic investigation of their goodness-of-fit to the theoretical underlying distribution, indicating strengths and weaknesses under different conditions, is still lacking. To bridge this gap, we thoroughly evaluated covariate distribution models in terms of Kullback-Leibler (KL) divergence, a scale-invariant information-theoretic goodness-of-fit criterion for distributions. Methodologically, we proposed a new approach to construct confidence intervals for KL divergence by combining nearest neighbour-based KL divergence estimators with subsampling-based uncertainty quantification. In relevant data sets of different sizes and dimensionalities with both continuous and discrete covariates, non-Gaussian models showed consistent improvements in KL divergence, compared to simpler Gaussian or scale transform approximations. KL divergence estimates were also robust to the inclusion of latent variables and large fractions of missing values. While good generalization behaviour to new data could be seen in copula-based models, MICE shows a trend for overfitting and its performance should always be evaluated on separate test data. Parametric copula models and MICE were found to scale much better with the dimension of the dataset than nonparametric copula models. These findings corroborate the potential of non-Gaussian models for modelling realistic life science covariate distributions.</p>","PeriodicalId":16851,"journal":{"name":"Journal of Pharmacokinetics and Pharmacodynamics","volume":"52 2","pages":"21"},"PeriodicalIF":2.2000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11950120/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Pharmacokinetics and Pharmacodynamics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10928-025-09968-5","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
引用次数: 0

Abstract

Statistical modelling of covariate distributions allows to generate virtual populations or to impute missing values in a covariate dataset. Covariate distributions typically have non-Gaussian margins and show nonlinear correlation structures, which simple descriptions like multivariate Gaussian distributions fail to represent. Prominent non-Gaussian frameworks for covariate distribution modelling are copula-based models and models based on multiple imputation by chained equations (MICE). While both frameworks have already found applications in the life sciences, a systematic investigation of their goodness-of-fit to the theoretical underlying distribution, indicating strengths and weaknesses under different conditions, is still lacking. To bridge this gap, we thoroughly evaluated covariate distribution models in terms of Kullback-Leibler (KL) divergence, a scale-invariant information-theoretic goodness-of-fit criterion for distributions. Methodologically, we proposed a new approach to construct confidence intervals for KL divergence by combining nearest neighbour-based KL divergence estimators with subsampling-based uncertainty quantification. In relevant data sets of different sizes and dimensionalities with both continuous and discrete covariates, non-Gaussian models showed consistent improvements in KL divergence, compared to simpler Gaussian or scale transform approximations. KL divergence estimates were also robust to the inclusion of latent variables and large fractions of missing values. While good generalization behaviour to new data could be seen in copula-based models, MICE shows a trend for overfitting and its performance should always be evaluated on separate test data. Parametric copula models and MICE were found to scale much better with the dimension of the dataset than nonparametric copula models. These findings corroborate the potential of non-Gaussian models for modelling realistic life science covariate distributions.

协变量分布模型的信息论评价。
协变量分布的统计建模允许生成虚拟种群或在协变量数据集中计算缺失值。协变量分布通常具有非高斯边界,并表现出非线性相关结构,这是多元高斯分布等简单描述无法表示的。协变量分布建模的突出非高斯框架是基于copula的模型和基于链式方程(MICE)的多次imputation模型。虽然这两个框架已经在生命科学中得到了应用,但仍然缺乏对它们与理论基础分布的拟合度的系统调查,表明在不同条件下的优势和劣势。为了弥补这一差距,我们根据Kullback-Leibler (KL)散度对协变量分布模型进行了全面评估,KL散度是分布的尺度不变信息论拟合优度准则。在方法上,我们提出了一种结合基于最近邻的KL散度估计和基于次抽样的不确定性量化来构建KL散度置信区间的新方法。在具有连续和离散协变量的不同规模和维数的相关数据集中,非高斯模型与更简单的高斯或尺度变换近似相比,在KL散度方面表现出一致的改善。KL散度估计对于包含潜在变量和缺失值的大部分也是稳健的。虽然在基于copula的模型中可以看到对新数据的良好泛化行为,但MICE显示出过拟合的趋势,其性能应始终在单独的测试数据上进行评估。与非参数copula模型相比,参数copula模型和MICE在数据集维度上具有更好的扩展能力。这些发现证实了非高斯模型在模拟现实生命科学协变量分布方面的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.90
自引率
4.00%
发文量
39
审稿时长
6-12 weeks
期刊介绍: Broadly speaking, the Journal of Pharmacokinetics and Pharmacodynamics covers the area of pharmacometrics. The journal is devoted to illustrating the importance of pharmacokinetics, pharmacodynamics, and pharmacometrics in drug development, clinical care, and the understanding of drug action. The journal publishes on a variety of topics related to pharmacometrics, including, but not limited to, clinical, experimental, and theoretical papers examining the kinetics of drug disposition and effects of drug action in humans, animals, in vitro, or in silico; modeling and simulation methodology, including optimal design; precision medicine; systems pharmacology; and mathematical pharmacology (including computational biology, bioengineering, and biophysics related to pharmacology, pharmacokinetics, orpharmacodynamics). Clinical papers that include population pharmacokinetic-pharmacodynamic relationships are welcome. The journal actively invites and promotes up-and-coming areas of pharmacometric research, such as real-world evidence, quality of life analyses, and artificial intelligence. The Journal of Pharmacokinetics and Pharmacodynamics is an official journal of the International Society of Pharmacometrics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信