Dianbo Liu , Karmel W. Choi , Paulo Lizano , William Yuan , Kun-Hsing Yu , Jordan Smoller , Isaac Kohane
{"title":"使用真实世界的医疗数据构建超大规模的严重精神疾病风险筛查工具","authors":"Dianbo Liu , Karmel W. Choi , Paulo Lizano , William Yuan , Kun-Hsing Yu , Jordan Smoller , Isaac Kohane","doi":"10.1016/j.schres.2025.06.024","DOIUrl":null,"url":null,"abstract":"<div><h3>Importance</h3><div>The prevalence of severe mental illnesses (SMIs) in the United States is approximately 3 % of the whole population. The ability to conduct risk screening of SMIs at large scale could inform early prevention and treatment.</div></div><div><h3>Objective</h3><div>A scalable machine learning based tool was developed to conduct population-level risk screening for SMIs, including schizophrenia, schizoaffective disorders, psychosis, and bipolar disorders, using 1) healthcare insurance claims and 2) electronic health records (EHRs).</div></div><div><h3>Design, setting and participants</h3><div>Data from beneficiaries from a nationwide commercial healthcare insurer with 77.4 million members and data from patients from EHRs from eight academic hospitals based in the U.S. were used. First, the predictive models were constructed and tested using data in case-control cohorts from insurance claims or EHR data. Second, performance of the predictive models across data sources was analyzed. Third, as an illustrative application, the models were further trained to predict risks of SMIs among 18-year old young adults and individuals with substance associated conditions.</div></div><div><h3>Main outcomes and measures</h3><div>Machine learning-based predictive models for SMIs in the general population were built based on insurance claims and EHR.</div></div><div><h3>Results</h3><div>A total of 301,221 patients with SMIs and 2,439,890 control individuals were retrieved from the nationwide health insurance claim database in the U.S. A total of 59,319 patients with SMIs and 297,993 control individuals were retrieved from EHRs spanning eight different hospitals from a major integrated healthcare system in Massachusetts, U.S. The obtained predictive models for SMIs achieved AUCROC of 0.76, specificity of 79.1 % and sensitivity of 61.9 % on an independent test set of an all-age case-control cohort from insurance claim data, and AUCROC of 0.83, specificity of 85.1 % and sensitivity of 66.4 % using EHR data. The fine-tuned models for specific use case scenarios outperformed two rule based benchmark methods when predicting 12-month risk of SMIs among 18-year old young adults but had inferior performance to benchmark methods when predicting SMIs among individuals with substance associated conditions in claims data.</div></div><div><h3>Conclusion</h3><div>Performance of our SMI prediction models constructed using health insurance claims or EHR data suggest feasibility of using real world healthcare data for large scale screening of SMIs in the general population. In addition, our analysis showed cross data source generalizability of machine learning models trained on real world healthcare data. Models constructed from insurance claims appear to be transferable to EHR cohorts and vice versa.</div></div>","PeriodicalId":21417,"journal":{"name":"Schizophrenia Research","volume":"283 ","pages":"Pages 59-66"},"PeriodicalIF":3.6000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Construction of extra-large scale screening tools for risks of severe mental illnesses using real world healthcare data\",\"authors\":\"Dianbo Liu , Karmel W. Choi , Paulo Lizano , William Yuan , Kun-Hsing Yu , Jordan Smoller , Isaac Kohane\",\"doi\":\"10.1016/j.schres.2025.06.024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Importance</h3><div>The prevalence of severe mental illnesses (SMIs) in the United States is approximately 3 % of the whole population. The ability to conduct risk screening of SMIs at large scale could inform early prevention and treatment.</div></div><div><h3>Objective</h3><div>A scalable machine learning based tool was developed to conduct population-level risk screening for SMIs, including schizophrenia, schizoaffective disorders, psychosis, and bipolar disorders, using 1) healthcare insurance claims and 2) electronic health records (EHRs).</div></div><div><h3>Design, setting and participants</h3><div>Data from beneficiaries from a nationwide commercial healthcare insurer with 77.4 million members and data from patients from EHRs from eight academic hospitals based in the U.S. were used. First, the predictive models were constructed and tested using data in case-control cohorts from insurance claims or EHR data. Second, performance of the predictive models across data sources was analyzed. Third, as an illustrative application, the models were further trained to predict risks of SMIs among 18-year old young adults and individuals with substance associated conditions.</div></div><div><h3>Main outcomes and measures</h3><div>Machine learning-based predictive models for SMIs in the general population were built based on insurance claims and EHR.</div></div><div><h3>Results</h3><div>A total of 301,221 patients with SMIs and 2,439,890 control individuals were retrieved from the nationwide health insurance claim database in the U.S. A total of 59,319 patients with SMIs and 297,993 control individuals were retrieved from EHRs spanning eight different hospitals from a major integrated healthcare system in Massachusetts, U.S. The obtained predictive models for SMIs achieved AUCROC of 0.76, specificity of 79.1 % and sensitivity of 61.9 % on an independent test set of an all-age case-control cohort from insurance claim data, and AUCROC of 0.83, specificity of 85.1 % and sensitivity of 66.4 % using EHR data. The fine-tuned models for specific use case scenarios outperformed two rule based benchmark methods when predicting 12-month risk of SMIs among 18-year old young adults but had inferior performance to benchmark methods when predicting SMIs among individuals with substance associated conditions in claims data.</div></div><div><h3>Conclusion</h3><div>Performance of our SMI prediction models constructed using health insurance claims or EHR data suggest feasibility of using real world healthcare data for large scale screening of SMIs in the general population. In addition, our analysis showed cross data source generalizability of machine learning models trained on real world healthcare data. Models constructed from insurance claims appear to be transferable to EHR cohorts and vice versa.</div></div>\",\"PeriodicalId\":21417,\"journal\":{\"name\":\"Schizophrenia Research\",\"volume\":\"283 \",\"pages\":\"Pages 59-66\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2025-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Schizophrenia Research\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0920996425002403\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHIATRY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Schizophrenia Research","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0920996425002403","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}
Construction of extra-large scale screening tools for risks of severe mental illnesses using real world healthcare data
Importance
The prevalence of severe mental illnesses (SMIs) in the United States is approximately 3 % of the whole population. The ability to conduct risk screening of SMIs at large scale could inform early prevention and treatment.
Objective
A scalable machine learning based tool was developed to conduct population-level risk screening for SMIs, including schizophrenia, schizoaffective disorders, psychosis, and bipolar disorders, using 1) healthcare insurance claims and 2) electronic health records (EHRs).
Design, setting and participants
Data from beneficiaries from a nationwide commercial healthcare insurer with 77.4 million members and data from patients from EHRs from eight academic hospitals based in the U.S. were used. First, the predictive models were constructed and tested using data in case-control cohorts from insurance claims or EHR data. Second, performance of the predictive models across data sources was analyzed. Third, as an illustrative application, the models were further trained to predict risks of SMIs among 18-year old young adults and individuals with substance associated conditions.
Main outcomes and measures
Machine learning-based predictive models for SMIs in the general population were built based on insurance claims and EHR.
Results
A total of 301,221 patients with SMIs and 2,439,890 control individuals were retrieved from the nationwide health insurance claim database in the U.S. A total of 59,319 patients with SMIs and 297,993 control individuals were retrieved from EHRs spanning eight different hospitals from a major integrated healthcare system in Massachusetts, U.S. The obtained predictive models for SMIs achieved AUCROC of 0.76, specificity of 79.1 % and sensitivity of 61.9 % on an independent test set of an all-age case-control cohort from insurance claim data, and AUCROC of 0.83, specificity of 85.1 % and sensitivity of 66.4 % using EHR data. The fine-tuned models for specific use case scenarios outperformed two rule based benchmark methods when predicting 12-month risk of SMIs among 18-year old young adults but had inferior performance to benchmark methods when predicting SMIs among individuals with substance associated conditions in claims data.
Conclusion
Performance of our SMI prediction models constructed using health insurance claims or EHR data suggest feasibility of using real world healthcare data for large scale screening of SMIs in the general population. In addition, our analysis showed cross data source generalizability of machine learning models trained on real world healthcare data. Models constructed from insurance claims appear to be transferable to EHR cohorts and vice versa.
期刊介绍:
As official journal of the Schizophrenia International Research Society (SIRS) Schizophrenia Research is THE journal of choice for international researchers and clinicians to share their work with the global schizophrenia research community. More than 6000 institutes have online or print (or both) access to this journal - the largest specialist journal in the field, with the largest readership!
Schizophrenia Research''s time to first decision is as fast as 6 weeks and its publishing speed is as fast as 4 weeks until online publication (corrected proof/Article in Press) after acceptance and 14 weeks from acceptance until publication in a printed issue.
The journal publishes novel papers that really contribute to understanding the biology and treatment of schizophrenic disorders; Schizophrenia Research brings together biological, clinical and psychological research in order to stimulate the synthesis of findings from all disciplines involved in improving patient outcomes in schizophrenia.