使用机器学习对政府数据源进行新的集成，以识别美国各县的超级利用率。

Intelligence-based medicine Pub Date : 2023-01-01 DOI:10.1016/j.ibmed.2023.100093

Iben M. Ricket , Michael E. Matheny , Todd A. MacKenzie , Jennifer A. Emond , Kusum L. Ailawadi , Jeremiah R. Brown

{"title":"使用机器学习对政府数据源进行新的集成，以识别美国各县的超级利用率。","authors":"Iben M. Ricket , Michael E. Matheny , Todd A. MacKenzie , Jennifer A. Emond , Kusum L. Ailawadi , Jeremiah R. Brown","doi":"10.1016/j.ibmed.2023.100093","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Super-utilizers consume the greatest share of resource intensive healthcare (RIHC) and reducing their utilization remains a crucial challenge to healthcare systems in the United States (U.S.). The objective of this study was to predict RIHC among U.S. counties, using routinely collected data from the U.S. government, including information on consumer spending, offering an alternative method for identifying super-utilization among population units rather than individuals.</p></div><div><h3>Methods</h3><p>Cross-sectional data from 5 governmental sources in 2017 were used in a machine learning pipeline, where target-prediction features were selected and used in 4 distinct algorithms. Outcome metrics of RIHC utilization came from the American Hospital Association and included yearly: (1) emergency rooms visit, (2) inpatient days, and (3) hospital expenditures. Target-prediction features included: 149 demographic characteristics from the U.S. Census Bureau, 151 adult and child health characteristics from the Centers for Disease Control and Prevention, 151 community characteristics from the American Community Survey, and 571 consumer expenditures from the Bureau of Labor Statistics. SHAP analysis identified important target-prediction features for 3 RIHC outcome metrics.</p></div><div><h3>Results</h3><p>2475 counties with emergency rooms and 2491 counties with hospitals were included. The median yearly emergency room visits per capita was 0.450 [IQR:0.318, 0.618], the median inpatient days per capita was 0.368 [IQR: 0.176, 0.826], and the median hospital expenditures per capita was $2104 [IQR: $1299.93, 3362.97]. The coefficient of determination (R<sup>2</sup>), calculated on the test set, ranged between 0.267 and 0.447. Demographic and community characteristics were among the important predictors for all 3 RIHC outcome metrics.</p></div><div><h3>Conclusions</h3><p>Integrating diverse population characteristics from numerous governmental sources, we predicted 3-outcome metrics of RIHC among U.S. counties with good performance, offering a novel and actionable tool for identifying super-utilizer segments in the population. Wider integration of routinely collected data can be used to develop alternative methods for predicting RIHC among population units.</p></div>","PeriodicalId":73399,"journal":{"name":"Intelligence-based medicine","volume":"7 ","pages":"Article 100093"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/d0/83/nihms-1909855.PMC10358365.pdf","citationCount":"1","resultStr":"{\"title\":\"Novel integration of governmental data sources using machine learning to identify super-utilization among U.S. counties\",\"authors\":\"Iben M. Ricket , Michael E. Matheny , Todd A. MacKenzie , Jennifer A. Emond , Kusum L. Ailawadi , Jeremiah R. Brown\",\"doi\":\"10.1016/j.ibmed.2023.100093\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><p>Super-utilizers consume the greatest share of resource intensive healthcare (RIHC) and reducing their utilization remains a crucial challenge to healthcare systems in the United States (U.S.). The objective of this study was to predict RIHC among U.S. counties, using routinely collected data from the U.S. government, including information on consumer spending, offering an alternative method for identifying super-utilization among population units rather than individuals.</p></div><div><h3>Methods</h3><p>Cross-sectional data from 5 governmental sources in 2017 were used in a machine learning pipeline, where target-prediction features were selected and used in 4 distinct algorithms. Outcome metrics of RIHC utilization came from the American Hospital Association and included yearly: (1) emergency rooms visit, (2) inpatient days, and (3) hospital expenditures. Target-prediction features included: 149 demographic characteristics from the U.S. Census Bureau, 151 adult and child health characteristics from the Centers for Disease Control and Prevention, 151 community characteristics from the American Community Survey, and 571 consumer expenditures from the Bureau of Labor Statistics. SHAP analysis identified important target-prediction features for 3 RIHC outcome metrics.</p></div><div><h3>Results</h3><p>2475 counties with emergency rooms and 2491 counties with hospitals were included. The median yearly emergency room visits per capita was 0.450 [IQR:0.318, 0.618], the median inpatient days per capita was 0.368 [IQR: 0.176, 0.826], and the median hospital expenditures per capita was $2104 [IQR: $1299.93, 3362.97]. The coefficient of determination (R<sup>2</sup>), calculated on the test set, ranged between 0.267 and 0.447. Demographic and community characteristics were among the important predictors for all 3 RIHC outcome metrics.</p></div><div><h3>Conclusions</h3><p>Integrating diverse population characteristics from numerous governmental sources, we predicted 3-outcome metrics of RIHC among U.S. counties with good performance, offering a novel and actionable tool for identifying super-utilizer segments in the population. Wider integration of routinely collected data can be used to develop alternative methods for predicting RIHC among population units.</p></div>\",\"PeriodicalId\":73399,\"journal\":{\"name\":\"Intelligence-based medicine\",\"volume\":\"7 \",\"pages\":\"Article 100093\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/d0/83/nihms-1909855.PMC10358365.pdf\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intelligence-based medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666521223000078\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligence-based medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666521223000078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

背景：超级利用者消耗了资源密集型医疗保健（RIHC）的最大份额，降低其利用率仍然是美国医疗保健系统面临的一个关键挑战。本研究的目的是利用美国政府定期收集的数据，包括消费者支出信息，预测美国各县的资源密集型卫生保健，提供了一种用于识别种群单位而非个体之间的超利用率的替代方法。方法：在机器学习管道中使用2017年来自5个政府来源的横断面数据，其中选择目标预测特征并将其用于4种不同的算法。RIHC利用率的结果指标来自美国医院协会，包括每年：（1）急诊室就诊，（2）住院天数，（3）医院支出。目标预测特征包括：美国人口普查局的149个人口特征，疾病控制和预防中心的151个成人和儿童健康特征，美国社区调查的151个社区特征，以及劳工统计局的571个消费者支出。SHAP分析确定了3个RIHC结果指标的重要目标预测特征。结果：纳入2475个设有急诊室的县和2491个设有医院的县。年人均急诊就诊人次中位数为0.450[IQR:0.318，0.618]，人均住院天数中位数为0.368[IQR:0.176，0.826]，人均医院支出中位数为2104美元[IQR:129.93，3362.97]。根据测试集计算的决定系数（R2）在0.267和0.447之间。人口统计学和社区特征是所有3个RIHC结果指标的重要预测因素。结论：综合来自众多政府来源的不同人群特征，我们预测了美国表现良好的县的RIHC的3个结果指标，为识别人群中的超级利用者群体提供了一个新的可行工具。对常规收集的数据进行更广泛的整合，可用于开发预测人口单位间RIHC的替代方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Novel integration of governmental data sources using machine learning to identify super-utilization among U.S. counties

查看原文本刊更多论文

Novel integration of governmental data sources using machine learning to identify super-utilization among U.S. counties

Background

Super-utilizers consume the greatest share of resource intensive healthcare (RIHC) and reducing their utilization remains a crucial challenge to healthcare systems in the United States (U.S.). The objective of this study was to predict RIHC among U.S. counties, using routinely collected data from the U.S. government, including information on consumer spending, offering an alternative method for identifying super-utilization among population units rather than individuals.

Methods

Cross-sectional data from 5 governmental sources in 2017 were used in a machine learning pipeline, where target-prediction features were selected and used in 4 distinct algorithms. Outcome metrics of RIHC utilization came from the American Hospital Association and included yearly: (1) emergency rooms visit, (2) inpatient days, and (3) hospital expenditures. Target-prediction features included: 149 demographic characteristics from the U.S. Census Bureau, 151 adult and child health characteristics from the Centers for Disease Control and Prevention, 151 community characteristics from the American Community Survey, and 571 consumer expenditures from the Bureau of Labor Statistics. SHAP analysis identified important target-prediction features for 3 RIHC outcome metrics.

Results

2475 counties with emergency rooms and 2491 counties with hospitals were included. The median yearly emergency room visits per capita was 0.450 [IQR:0.318, 0.618], the median inpatient days per capita was 0.368 [IQR: 0.176, 0.826], and the median hospital expenditures per capita was $2104 [IQR: $1299.93, 3362.97]. The coefficient of determination (R²), calculated on the test set, ranged between 0.267 and 0.447. Demographic and community characteristics were among the important predictors for all 3 RIHC outcome metrics.

Conclusions

Integrating diverse population characteristics from numerous governmental sources, we predicted 3-outcome metrics of RIHC among U.S. counties with good performance, offering a novel and actionable tool for identifying super-utilizer segments in the population. Wider integration of routinely collected data can be used to develop alternative methods for predicting RIHC among population units.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Intelligence-based medicine Health Informatics

CiteScore

5.00

自引率

0.00%

发文量

审稿时长

187 days