{"title":"Information-theoretic evaluation of covariate distributions models.","authors":"Niklas Hartung, Aleksandra Khatova","doi":"10.1007/s10928-025-09968-5","DOIUrl":null,"url":null,"abstract":"<p><p>Statistical modelling of covariate distributions allows to generate virtual populations or to impute missing values in a covariate dataset. Covariate distributions typically have non-Gaussian margins and show nonlinear correlation structures, which simple descriptions like multivariate Gaussian distributions fail to represent. Prominent non-Gaussian frameworks for covariate distribution modelling are copula-based models and models based on multiple imputation by chained equations (MICE). While both frameworks have already found applications in the life sciences, a systematic investigation of their goodness-of-fit to the theoretical underlying distribution, indicating strengths and weaknesses under different conditions, is still lacking. To bridge this gap, we thoroughly evaluated covariate distribution models in terms of Kullback-Leibler (KL) divergence, a scale-invariant information-theoretic goodness-of-fit criterion for distributions. Methodologically, we proposed a new approach to construct confidence intervals for KL divergence by combining nearest neighbour-based KL divergence estimators with subsampling-based uncertainty quantification. In relevant data sets of different sizes and dimensionalities with both continuous and discrete covariates, non-Gaussian models showed consistent improvements in KL divergence, compared to simpler Gaussian or scale transform approximations. KL divergence estimates were also robust to the inclusion of latent variables and large fractions of missing values. While good generalization behaviour to new data could be seen in copula-based models, MICE shows a trend for overfitting and its performance should always be evaluated on separate test data. Parametric copula models and MICE were found to scale much better with the dimension of the dataset than nonparametric copula models. These findings corroborate the potential of non-Gaussian models for modelling realistic life science covariate distributions.</p>","PeriodicalId":16851,"journal":{"name":"Journal of Pharmacokinetics and Pharmacodynamics","volume":"52 2","pages":"21"},"PeriodicalIF":2.2000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11950120/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Pharmacokinetics and Pharmacodynamics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10928-025-09968-5","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
引用次数: 0
Abstract
Statistical modelling of covariate distributions allows to generate virtual populations or to impute missing values in a covariate dataset. Covariate distributions typically have non-Gaussian margins and show nonlinear correlation structures, which simple descriptions like multivariate Gaussian distributions fail to represent. Prominent non-Gaussian frameworks for covariate distribution modelling are copula-based models and models based on multiple imputation by chained equations (MICE). While both frameworks have already found applications in the life sciences, a systematic investigation of their goodness-of-fit to the theoretical underlying distribution, indicating strengths and weaknesses under different conditions, is still lacking. To bridge this gap, we thoroughly evaluated covariate distribution models in terms of Kullback-Leibler (KL) divergence, a scale-invariant information-theoretic goodness-of-fit criterion for distributions. Methodologically, we proposed a new approach to construct confidence intervals for KL divergence by combining nearest neighbour-based KL divergence estimators with subsampling-based uncertainty quantification. In relevant data sets of different sizes and dimensionalities with both continuous and discrete covariates, non-Gaussian models showed consistent improvements in KL divergence, compared to simpler Gaussian or scale transform approximations. KL divergence estimates were also robust to the inclusion of latent variables and large fractions of missing values. While good generalization behaviour to new data could be seen in copula-based models, MICE shows a trend for overfitting and its performance should always be evaluated on separate test data. Parametric copula models and MICE were found to scale much better with the dimension of the dataset than nonparametric copula models. These findings corroborate the potential of non-Gaussian models for modelling realistic life science covariate distributions.
期刊介绍:
Broadly speaking, the Journal of Pharmacokinetics and Pharmacodynamics covers the area of pharmacometrics. The journal is devoted to illustrating the importance of pharmacokinetics, pharmacodynamics, and pharmacometrics in drug development, clinical care, and the understanding of drug action. The journal publishes on a variety of topics related to pharmacometrics, including, but not limited to, clinical, experimental, and theoretical papers examining the kinetics of drug disposition and effects of drug action in humans, animals, in vitro, or in silico; modeling and simulation methodology, including optimal design; precision medicine; systems pharmacology; and mathematical pharmacology (including computational biology, bioengineering, and biophysics related to pharmacology, pharmacokinetics, orpharmacodynamics). Clinical papers that include population pharmacokinetic-pharmacodynamic relationships are welcome. The journal actively invites and promotes up-and-coming areas of pharmacometric research, such as real-world evidence, quality of life analyses, and artificial intelligence. The Journal of Pharmacokinetics and Pharmacodynamics is an official journal of the International Society of Pharmacometrics.