Large-Scale Evaluation and Liver Disease Risk Prediction in Finland's National Electronic Health Record System: Feasibility Study Using Real-World Data.

IF 3.8 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics Pub Date : 2025-04-02 DOI:10.2196/62978

Viljami Männikkö, Janne Tommola, Emmi Tikkanen, Olli-Pekka Hätinen, Fredrik Åberg

{"title":"Large-Scale Evaluation and Liver Disease Risk Prediction in Finland's National Electronic Health Record System: Feasibility Study Using Real-World Data.","authors":"Viljami Männikkö, Janne Tommola, Emmi Tikkanen, Olli-Pekka Hätinen, Fredrik Åberg","doi":"10.2196/62978","DOIUrl":null,"url":null,"abstract":"Background: Globally, the incidence and mortality of chronic liver disease are escalating. Early detection of liver disease remains a challenge, often occurring at symptomatic stages when preventative measures are less effective. The Chronic Liver Disease score (CLivD) is a predictive risk model developed using Finnish health care data, aiming to forecast an individual's risk of developing chronic liver disease in subsequent years. The Kanta Service is a national electronic health record system in Finland that stores comprehensive health care data including patient medical histories, prescriptions, and laboratory results, to facilitate health care delivery and research.Objective: This study aimed to evaluate the feasibility of implementing an automatic CLivD score with the current Kanta platform and identify and suggest improvements for Kanta that would enable accurate automatic risk detection.Methods: In this study, a real-world data repository (Kanta) was used as a data source for \"The ClivD score\" risk calculation model. Our dataset consisted of 96,200 individuals' whole medical history from Kanta. For real-world data use, we designed processes to handle missing input in the calculation process.Results: We found that Kanta currently lacks many CLivD risk model input parameters in the structured format required to calculate precise risk scores. However, the risk scores can be improved by using the unstructured text in patient reports and by approximating variables by using other health data-like diagnosis information. Using structured data, we were able to identify only 33 out of 51,275 individuals in the \"low risk\" category and 308 out of 51,275 individuals (<1%) in the \"moderate risk\" category. By adding diagnosis information approximation and free text use, we were able to identify 18,895 out of 51,275 (37%) individuals in the \"low risk\" category and 2125 out of 51,275 (4%) individuals in the \"moderate risk\" category. In both cases, we were not able to identify any individuals in the \"high-risk\" category because of the missing waist-hip ratio measurement. We evaluated 3 scenarios to improve the coverage of waist-hip ratio data in Kanta and these yielded the most substantial improvement in prediction accuracy.Conclusions: We conclude that the current structured Kanta data is not enough for precise risk calculation for CLivD or other diseases where obesity, smoking, and alcohol use are important risk factors. Our simulations show up to 14% improvement in risk detection when adding support for missing input variables. Kanta shows the potential for implementing nationwide automated risk detection models that could result in improved disease prevention and public health.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e62978"},"PeriodicalIF":3.8000,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12004021/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/62978","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Globally, the incidence and mortality of chronic liver disease are escalating. Early detection of liver disease remains a challenge, often occurring at symptomatic stages when preventative measures are less effective. The Chronic Liver Disease score (CLivD) is a predictive risk model developed using Finnish health care data, aiming to forecast an individual's risk of developing chronic liver disease in subsequent years. The Kanta Service is a national electronic health record system in Finland that stores comprehensive health care data including patient medical histories, prescriptions, and laboratory results, to facilitate health care delivery and research.

Objective: This study aimed to evaluate the feasibility of implementing an automatic CLivD score with the current Kanta platform and identify and suggest improvements for Kanta that would enable accurate automatic risk detection.

Methods: In this study, a real-world data repository (Kanta) was used as a data source for "The ClivD score" risk calculation model. Our dataset consisted of 96,200 individuals' whole medical history from Kanta. For real-world data use, we designed processes to handle missing input in the calculation process.

Results: We found that Kanta currently lacks many CLivD risk model input parameters in the structured format required to calculate precise risk scores. However, the risk scores can be improved by using the unstructured text in patient reports and by approximating variables by using other health data-like diagnosis information. Using structured data, we were able to identify only 33 out of 51,275 individuals in the "low risk" category and 308 out of 51,275 individuals (<1%) in the "moderate risk" category. By adding diagnosis information approximation and free text use, we were able to identify 18,895 out of 51,275 (37%) individuals in the "low risk" category and 2125 out of 51,275 (4%) individuals in the "moderate risk" category. In both cases, we were not able to identify any individuals in the "high-risk" category because of the missing waist-hip ratio measurement. We evaluated 3 scenarios to improve the coverage of waist-hip ratio data in Kanta and these yielded the most substantial improvement in prediction accuracy.

Conclusions: We conclude that the current structured Kanta data is not enough for precise risk calculation for CLivD or other diseases where obesity, smoking, and alcohol use are important risk factors. Our simulations show up to 14% improvement in risk detection when adding support for missing input variables. Kanta shows the potential for implementing nationwide automated risk detection models that could result in improved disease prevention and public health.

查看原文本刊更多论文

芬兰国家电子健康记录系统的大规模评估和肝脏疾病风险预测：使用真实世界数据的可行性研究。

背景：在全球范围内，慢性肝病的发病率和死亡率正在上升。早期发现肝病仍然是一项挑战，往往发生在预防措施效果较差的症状阶段。慢性肝病评分（CLivD）是一种使用芬兰卫生保健数据开发的预测风险模型，旨在预测个体在随后几年发展为慢性肝病的风险。Kanta Service是芬兰的一个国家电子健康记录系统，存储全面的医疗保健数据，包括患者病史、处方和实验室结果，以促进医疗保健提供和研究。目的：本研究旨在评估当前Kanta平台实施自动CLivD评分的可行性，并确定并建议对Kanta进行改进，以实现准确的自动风险检测。方法：在本研究中，使用现实世界数据存储库（Kanta）作为“ClivD评分”风险计算模型的数据源。我们的数据集包括来自坎塔的96200个人的整个病史。对于实际数据的使用，我们设计了处理计算过程中缺失输入的流程。结果：我们发现Kanta目前缺乏许多用于计算精确风险评分所需的结构化格式的CLivD风险模型输入参数。但是，可以通过使用患者报告中的非结构化文本和使用其他健康数据（如诊断信息）近似变量来提高风险评分。使用结构化数据，我们只能在51,275个人中识别出“低风险”类别中的33人，在51,275个人中识别出308人(结论：我们得出结论，目前结构化的Kanta数据不足以精确计算CLivD或其他疾病的风险，其中肥胖、吸烟和饮酒是重要的风险因素。我们的模拟显示，当增加对缺失输入变量的支持时，风险检测提高了14%。Kanta展示了在全国范围内实施自动风险检测模型的潜力，这可能会改善疾病预防和公共卫生。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

JMIR Medical Informatics Medicine-Health Informatics

CiteScore

7.90

自引率

3.10%

发文量

173

审稿时长

12 weeks

期刊介绍： JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.