大数据中的联合模型：纵向电子健康记录中所需数据质量的基于仿真的指南。

IF 6.1 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining Pub Date : 2025-05-13 DOI:10.1186/s13040-025-00450-z

Berit Hunsdieck, Christian Bender, Katja Ickstadt, Johanna Mielke

{"title":"大数据中的联合模型：纵向电子健康记录中所需数据质量的基于仿真的指南。","authors":"Berit Hunsdieck, Christian Bender, Katja Ickstadt, Johanna Mielke","doi":"10.1186/s13040-025-00450-z","DOIUrl":null,"url":null,"abstract":"Background: Over the past decade an increase in usage of electronic health data (EHR) by office-based physicians and hospitals has been reported. However, these data types come with challenge regarding completeness and data quality and it is, especially for more complex models, unclear how these characteristics influence the performance.Methods: In this paper, we focus on joint models which combines longitudinal modelling with survival modelling to incorporate all available information. The aim of this paper is to establish simulation-based guidelines for the necessary quality of longitudinal EHR data so that joint models perform better than cox models. We conducted an extensive simulation study by systematically and transparently varying different characteristics of data quality, e.g., measurement frequency, noise, and heterogeneity between patients. We apply the joint models and evaluate their performance relative to traditional Cox survival modelling techniques.Results: Key findings suggest that biomarker changes before disease onset must be consistent within similar patient groups. With increasing noise and a higher measurement density, the joint model surpasses the traditional Cox regression model in terms of model performance. We illustrate the usefulness and limitations of the guidelines with two real-world examples, namely the influence of serum bilirubin on primary biliary liver cirrhosis and the influence of the estimated glomerular filtration rate on chronic kidney disease.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"35"},"PeriodicalIF":6.1000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12070788/pdf/","citationCount":"0","resultStr":"{\"title\":\"Joint models in big data: simulation-based guidelines for required data quality in longitudinal electronic health records.\",\"authors\":\"Berit Hunsdieck, Christian Bender, Katja Ickstadt, Johanna Mielke\",\"doi\":\"10.1186/s13040-025-00450-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Over the past decade an increase in usage of electronic health data (EHR) by office-based physicians and hospitals has been reported. However, these data types come with challenge regarding completeness and data quality and it is, especially for more complex models, unclear how these characteristics influence the performance.Methods: In this paper, we focus on joint models which combines longitudinal modelling with survival modelling to incorporate all available information. The aim of this paper is to establish simulation-based guidelines for the necessary quality of longitudinal EHR data so that joint models perform better than cox models. We conducted an extensive simulation study by systematically and transparently varying different characteristics of data quality, e.g., measurement frequency, noise, and heterogeneity between patients. We apply the joint models and evaluate their performance relative to traditional Cox survival modelling techniques.Results: Key findings suggest that biomarker changes before disease onset must be consistent within similar patient groups. With increasing noise and a higher measurement density, the joint model surpasses the traditional Cox regression model in terms of model performance. We illustrate the usefulness and limitations of the guidelines with two real-world examples, namely the influence of serum bilirubin on primary biliary liver cirrhosis and the influence of the estimated glomerular filtration rate on chronic kidney disease.\",\"PeriodicalId\":48947,\"journal\":{\"name\":\"Biodata Mining\",\"volume\":\"18 1\",\"pages\":\"35\"},\"PeriodicalIF\":6.1000,\"publicationDate\":\"2025-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12070788/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biodata Mining\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s13040-025-00450-z\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodata Mining","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13040-025-00450-z","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

背景：据报道，在过去十年中，办公室医生和医院对电子健康数据（EHR）的使用有所增加。然而，这些数据类型带来了完整性和数据质量方面的挑战，特别是对于更复杂的模型，不清楚这些特征如何影响性能。方法：采用纵向模型和生存模型相结合的联合模型，将所有可用信息纳入模型中。本文的目的是为纵向电子病历数据的必要质量建立基于仿真的指导方针，使联合模型比cox模型表现得更好。我们通过系统和透明地改变数据质量的不同特征，例如测量频率、噪声和患者之间的异质性，进行了广泛的模拟研究。我们应用联合模型并评估其相对于传统Cox生存建模技术的性能。结果：关键发现表明，疾病发病前的生物标志物变化必须在相似的患者组中保持一致。随着噪声的增加和测量密度的提高，联合模型在模型性能上超过了传统的Cox回归模型。我们用两个现实世界的例子来说明指南的有用性和局限性，即血清胆红素对原发性胆汁性肝硬化的影响和估计肾小球滤过率对慢性肾脏疾病的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Joint models in big data: simulation-based guidelines for required data quality in longitudinal electronic health records.

查看原文本刊更多论文

Joint models in big data: simulation-based guidelines for required data quality in longitudinal electronic health records.

Background: Over the past decade an increase in usage of electronic health data (EHR) by office-based physicians and hospitals has been reported. However, these data types come with challenge regarding completeness and data quality and it is, especially for more complex models, unclear how these characteristics influence the performance.

Methods: In this paper, we focus on joint models which combines longitudinal modelling with survival modelling to incorporate all available information. The aim of this paper is to establish simulation-based guidelines for the necessary quality of longitudinal EHR data so that joint models perform better than cox models. We conducted an extensive simulation study by systematically and transparently varying different characteristics of data quality, e.g., measurement frequency, noise, and heterogeneity between patients. We apply the joint models and evaluate their performance relative to traditional Cox survival modelling techniques.

Results: Key findings suggest that biomarker changes before disease onset must be consistent within similar patient groups. With increasing noise and a higher measurement density, the joint model surpasses the traditional Cox regression model in terms of model performance. We illustrate the usefulness and limitations of the guidelines with two real-world examples, namely the influence of serum bilirubin on primary biliary liver cirrhosis and the influence of the estimated glomerular filtration rate on chronic kidney disease.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biodata Mining MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

7.90

自引率

0.00%

发文量

审稿时长

23 weeks

期刊介绍： BioData Mining is an open access, open peer-reviewed journal encompassing research on all aspects of data mining applied to high-dimensional biological and biomedical data, focusing on computational aspects of knowledge discovery from large-scale genetic, transcriptomic, genomic, proteomic, and metabolomic data. Topical areas include, but are not limited to: -Development, evaluation, and application of novel data mining and machine learning algorithms. -Adaptation, evaluation, and application of traditional data mining and machine learning algorithms. -Open-source software for the application of data mining and machine learning algorithms. -Design, development and integration of databases, software and web services for the storage, management, retrieval, and analysis of data from large scale studies. -Pre-processing, post-processing, modeling, and interpretation of data mining and machine learning results for biological interpretation and knowledge discovery.