使用真实世界的医疗数据构建超大规模的严重精神疾病风险筛查工具

IF 3.6 2区 医学 Q1 PSYCHIATRY
Dianbo Liu , Karmel W. Choi , Paulo Lizano , William Yuan , Kun-Hsing Yu , Jordan Smoller , Isaac Kohane
{"title":"使用真实世界的医疗数据构建超大规模的严重精神疾病风险筛查工具","authors":"Dianbo Liu ,&nbsp;Karmel W. Choi ,&nbsp;Paulo Lizano ,&nbsp;William Yuan ,&nbsp;Kun-Hsing Yu ,&nbsp;Jordan Smoller ,&nbsp;Isaac Kohane","doi":"10.1016/j.schres.2025.06.024","DOIUrl":null,"url":null,"abstract":"<div><h3>Importance</h3><div>The prevalence of severe mental illnesses (SMIs) in the United States is approximately 3 % of the whole population. The ability to conduct risk screening of SMIs at large scale could inform early prevention and treatment.</div></div><div><h3>Objective</h3><div>A scalable machine learning based tool was developed to conduct population-level risk screening for SMIs, including schizophrenia, schizoaffective disorders, psychosis, and bipolar disorders, using 1) healthcare insurance claims and 2) electronic health records (EHRs).</div></div><div><h3>Design, setting and participants</h3><div>Data from beneficiaries from a nationwide commercial healthcare insurer with 77.4 million members and data from patients from EHRs from eight academic hospitals based in the U.S. were used. First, the predictive models were constructed and tested using data in case-control cohorts from insurance claims or EHR data. Second, performance of the predictive models across data sources was analyzed. Third, as an illustrative application, the models were further trained to predict risks of SMIs among 18-year old young adults and individuals with substance associated conditions.</div></div><div><h3>Main outcomes and measures</h3><div>Machine learning-based predictive models for SMIs in the general population were built based on insurance claims and EHR.</div></div><div><h3>Results</h3><div>A total of 301,221 patients with SMIs and 2,439,890 control individuals were retrieved from the nationwide health insurance claim database in the U.S. A total of 59,319 patients with SMIs and 297,993 control individuals were retrieved from EHRs spanning eight different hospitals from a major integrated healthcare system in Massachusetts, U.S. The obtained predictive models for SMIs achieved AUCROC of 0.76, specificity of 79.1 % and sensitivity of 61.9 % on an independent test set of an all-age case-control cohort from insurance claim data, and AUCROC of 0.83, specificity of 85.1 % and sensitivity of 66.4 % using EHR data. The fine-tuned models for specific use case scenarios outperformed two rule based benchmark methods when predicting 12-month risk of SMIs among 18-year old young adults but had inferior performance to benchmark methods when predicting SMIs among individuals with substance associated conditions in claims data.</div></div><div><h3>Conclusion</h3><div>Performance of our SMI prediction models constructed using health insurance claims or EHR data suggest feasibility of using real world healthcare data for large scale screening of SMIs in the general population. In addition, our analysis showed cross data source generalizability of machine learning models trained on real world healthcare data. Models constructed from insurance claims appear to be transferable to EHR cohorts and vice versa.</div></div>","PeriodicalId":21417,"journal":{"name":"Schizophrenia Research","volume":"283 ","pages":"Pages 59-66"},"PeriodicalIF":3.6000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Construction of extra-large scale screening tools for risks of severe mental illnesses using real world healthcare data\",\"authors\":\"Dianbo Liu ,&nbsp;Karmel W. Choi ,&nbsp;Paulo Lizano ,&nbsp;William Yuan ,&nbsp;Kun-Hsing Yu ,&nbsp;Jordan Smoller ,&nbsp;Isaac Kohane\",\"doi\":\"10.1016/j.schres.2025.06.024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Importance</h3><div>The prevalence of severe mental illnesses (SMIs) in the United States is approximately 3 % of the whole population. The ability to conduct risk screening of SMIs at large scale could inform early prevention and treatment.</div></div><div><h3>Objective</h3><div>A scalable machine learning based tool was developed to conduct population-level risk screening for SMIs, including schizophrenia, schizoaffective disorders, psychosis, and bipolar disorders, using 1) healthcare insurance claims and 2) electronic health records (EHRs).</div></div><div><h3>Design, setting and participants</h3><div>Data from beneficiaries from a nationwide commercial healthcare insurer with 77.4 million members and data from patients from EHRs from eight academic hospitals based in the U.S. were used. First, the predictive models were constructed and tested using data in case-control cohorts from insurance claims or EHR data. Second, performance of the predictive models across data sources was analyzed. Third, as an illustrative application, the models were further trained to predict risks of SMIs among 18-year old young adults and individuals with substance associated conditions.</div></div><div><h3>Main outcomes and measures</h3><div>Machine learning-based predictive models for SMIs in the general population were built based on insurance claims and EHR.</div></div><div><h3>Results</h3><div>A total of 301,221 patients with SMIs and 2,439,890 control individuals were retrieved from the nationwide health insurance claim database in the U.S. A total of 59,319 patients with SMIs and 297,993 control individuals were retrieved from EHRs spanning eight different hospitals from a major integrated healthcare system in Massachusetts, U.S. The obtained predictive models for SMIs achieved AUCROC of 0.76, specificity of 79.1 % and sensitivity of 61.9 % on an independent test set of an all-age case-control cohort from insurance claim data, and AUCROC of 0.83, specificity of 85.1 % and sensitivity of 66.4 % using EHR data. The fine-tuned models for specific use case scenarios outperformed two rule based benchmark methods when predicting 12-month risk of SMIs among 18-year old young adults but had inferior performance to benchmark methods when predicting SMIs among individuals with substance associated conditions in claims data.</div></div><div><h3>Conclusion</h3><div>Performance of our SMI prediction models constructed using health insurance claims or EHR data suggest feasibility of using real world healthcare data for large scale screening of SMIs in the general population. In addition, our analysis showed cross data source generalizability of machine learning models trained on real world healthcare data. Models constructed from insurance claims appear to be transferable to EHR cohorts and vice versa.</div></div>\",\"PeriodicalId\":21417,\"journal\":{\"name\":\"Schizophrenia Research\",\"volume\":\"283 \",\"pages\":\"Pages 59-66\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2025-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Schizophrenia Research\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0920996425002403\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHIATRY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Schizophrenia Research","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0920996425002403","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}
引用次数: 0

摘要

在美国,严重精神疾病(SMIs)的患病率约占总人口的3%。大规模开展SMIs风险筛查的能力可以为早期预防和治疗提供信息。目的:开发一种基于可扩展机器学习的工具,利用1)医疗保险索赔和2)电子健康记录(EHRs)对精神分裂症、分裂情感性障碍、精神病和双相情感障碍等重度精神分裂症进行人群水平的风险筛查。设计、设置和参与者数据来自一家拥有7740万会员的全国性商业医疗保险公司的受益人,数据来自美国八家学术医院的电子病历。首先,使用来自保险索赔或电子病历数据的病例对照队列数据构建预测模型并进行测试。其次,分析了跨数据源预测模型的性能。第三,作为一个说明性应用,进一步训练模型来预测18岁年轻人和物质相关疾病个体的SMIs风险。主要结果和措施基于保险索赔和电子病历建立了普通人群中SMIs的基于机器学习的预测模型。结果从美国全国健康保险索赔数据库中共检索到301,221名SMIs患者和2,439,890名对照,从美国马萨诸塞州主要综合医疗系统的8家不同医院的电子病历中共检索到59,319名SMIs患者和297,993名对照,获得的SMIs预测模型的AUCROC为0.76。在来自保险索赔数据的全年龄病例对照队列的独立测试集上,特异性为79.1%,敏感性为61.9%;使用电子病历数据,AUCROC为0.83,特异性为85.1%,敏感性为66.4%。在预测18岁年轻人12个月的SMIs风险时,针对特定用例场景的微调模型优于两种基于规则的基准方法,但在预测索赔数据中具有物质相关条件的个体的SMIs时,其性能低于基准方法。结论:我们使用健康保险索赔或电子病历数据构建的SMI预测模型的性能表明,在普通人群中使用现实世界的医疗数据进行大规模SMI筛查是可行的。此外,我们的分析显示了在真实世界医疗保健数据上训练的机器学习模型的跨数据源泛化性。从保险索赔构建的模型似乎可转移到电子病历队列,反之亦然。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Construction of extra-large scale screening tools for risks of severe mental illnesses using real world healthcare data

Importance

The prevalence of severe mental illnesses (SMIs) in the United States is approximately 3 % of the whole population. The ability to conduct risk screening of SMIs at large scale could inform early prevention and treatment.

Objective

A scalable machine learning based tool was developed to conduct population-level risk screening for SMIs, including schizophrenia, schizoaffective disorders, psychosis, and bipolar disorders, using 1) healthcare insurance claims and 2) electronic health records (EHRs).

Design, setting and participants

Data from beneficiaries from a nationwide commercial healthcare insurer with 77.4 million members and data from patients from EHRs from eight academic hospitals based in the U.S. were used. First, the predictive models were constructed and tested using data in case-control cohorts from insurance claims or EHR data. Second, performance of the predictive models across data sources was analyzed. Third, as an illustrative application, the models were further trained to predict risks of SMIs among 18-year old young adults and individuals with substance associated conditions.

Main outcomes and measures

Machine learning-based predictive models for SMIs in the general population were built based on insurance claims and EHR.

Results

A total of 301,221 patients with SMIs and 2,439,890 control individuals were retrieved from the nationwide health insurance claim database in the U.S. A total of 59,319 patients with SMIs and 297,993 control individuals were retrieved from EHRs spanning eight different hospitals from a major integrated healthcare system in Massachusetts, U.S. The obtained predictive models for SMIs achieved AUCROC of 0.76, specificity of 79.1 % and sensitivity of 61.9 % on an independent test set of an all-age case-control cohort from insurance claim data, and AUCROC of 0.83, specificity of 85.1 % and sensitivity of 66.4 % using EHR data. The fine-tuned models for specific use case scenarios outperformed two rule based benchmark methods when predicting 12-month risk of SMIs among 18-year old young adults but had inferior performance to benchmark methods when predicting SMIs among individuals with substance associated conditions in claims data.

Conclusion

Performance of our SMI prediction models constructed using health insurance claims or EHR data suggest feasibility of using real world healthcare data for large scale screening of SMIs in the general population. In addition, our analysis showed cross data source generalizability of machine learning models trained on real world healthcare data. Models constructed from insurance claims appear to be transferable to EHR cohorts and vice versa.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Schizophrenia Research
Schizophrenia Research 医学-精神病学
CiteScore
7.50
自引率
8.90%
发文量
429
审稿时长
10.2 weeks
期刊介绍: As official journal of the Schizophrenia International Research Society (SIRS) Schizophrenia Research is THE journal of choice for international researchers and clinicians to share their work with the global schizophrenia research community. More than 6000 institutes have online or print (or both) access to this journal - the largest specialist journal in the field, with the largest readership! Schizophrenia Research''s time to first decision is as fast as 6 weeks and its publishing speed is as fast as 4 weeks until online publication (corrected proof/Article in Press) after acceptance and 14 weeks from acceptance until publication in a printed issue. The journal publishes novel papers that really contribute to understanding the biology and treatment of schizophrenic disorders; Schizophrenia Research brings together biological, clinical and psychological research in order to stimulate the synthesis of findings from all disciplines involved in improving patient outcomes in schizophrenia.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信