疾病风险预测的实用方法：通过最高k损失关注高危患者。

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine Pub Date : 2023-12-01 Epub Date: 2024-01-18 DOI:10.1109/bibm58861.2023.10385816

Hongyi Yang, Rich Gonzalez, Brahmajee K Nallamothu, Keith D Aaronson, Kevin R Ward, Alfred O Hero, Sardar Ansari

{"title":"疾病风险预测的实用方法：通过最高k损失关注高危患者。","authors":"Hongyi Yang, Rich Gonzalez, Brahmajee K Nallamothu, Keith D Aaronson, Kevin R Ward, Alfred O Hero, Sardar Ansari","doi":"10.1109/bibm58861.2023.10385816","DOIUrl":null,"url":null,"abstract":"Disease risk prediction models play an important role in preventing disease developments in modern healthcare. However, the lack of focus on high-risk patients has hindered the large-scale practical application of these models, especially considering the limitation of medical resources available for following up on patients who are deemed high-risk. In this study, we propose a novel and practical approach that focuses on minimizing the number of false positive observations among high-risk patients by introducing the Highest-k Loss. The solution is to estimate the weights of the highest <math><mi>k</mi></math> scores with a differentiable estimation of the sorting operation and apply the weights to the loss function. We extracted 253,680 survey responses from a public dataset of the U.S. health survey system to define a diabetes prediction task. This study employs nested cross-validation as well as an aggregated model applied to an independent test set to systematically evaluate the proposed method. Compared with traditional binary cross entropy loss and Focal loss, the Highest- <math><mi>k</mi></math> loss improved the precision (positive predictive value) for the highest 1% scores by 0.05 (95% CI: 0.041-0.055), the highest 5% scores by 0.03 (95% CI: 0.024-0.032), and the highest 10% scores by 0.02 (95% CI: 0.016-0.021). The introduced Highest- <math><mi>k</mi></math> loss function addresses the problem of prevailing risk prediction models and offers a practical solution that focuses on patients with the <math><mi>k</mi></math> highest predictive scores who can realistically receive an intervention as opposed to the entire patient population.","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2023 ","pages":"3226-3233"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11821551/pdf/","citationCount":"0","resultStr":"{\"title\":\"A Practical Approach to Disease Risk Prediction: Focus on High-Risk Patients via Highest-k Loss.\",\"authors\":\"Hongyi Yang, Rich Gonzalez, Brahmajee K Nallamothu, Keith D Aaronson, Kevin R Ward, Alfred O Hero, Sardar Ansari\",\"doi\":\"10.1109/bibm58861.2023.10385816\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Disease risk prediction models play an important role in preventing disease developments in modern healthcare. However, the lack of focus on high-risk patients has hindered the large-scale practical application of these models, especially considering the limitation of medical resources available for following up on patients who are deemed high-risk. In this study, we propose a novel and practical approach that focuses on minimizing the number of false positive observations among high-risk patients by introducing the Highest-k Loss. The solution is to estimate the weights of the highest <math><mi>k</mi></math> scores with a differentiable estimation of the sorting operation and apply the weights to the loss function. We extracted 253,680 survey responses from a public dataset of the U.S. health survey system to define a diabetes prediction task. This study employs nested cross-validation as well as an aggregated model applied to an independent test set to systematically evaluate the proposed method. Compared with traditional binary cross entropy loss and Focal loss, the Highest- <math><mi>k</mi></math> loss improved the precision (positive predictive value) for the highest 1% scores by 0.05 (95% CI: 0.041-0.055), the highest 5% scores by 0.03 (95% CI: 0.024-0.032), and the highest 10% scores by 0.02 (95% CI: 0.016-0.021). The introduced Highest- <math><mi>k</mi></math> loss function addresses the problem of prevailing risk prediction models and offers a practical solution that focuses on patients with the <math><mi>k</mi></math> highest predictive scores who can realistically receive an intervention as opposed to the entire patient population.\",\"PeriodicalId\":74563,\"journal\":{\"name\":\"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine\",\"volume\":\"2023 \",\"pages\":\"3226-3233\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11821551/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/bibm58861.2023.10385816\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/18 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/bibm58861.2023.10385816","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/18 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

疾病风险预测模型在预防疾病发展方面发挥着重要作用。然而，缺乏对高危患者的关注阻碍了这些模型的大规模实际应用，特别是考虑到对高风险患者进行随访的医疗资源有限。在这项研究中，我们提出了一种新颖实用的方法，通过引入最高k损失，将高风险患者的假阳性观察数量降至最低。解决方案是使用排序操作的可微估计来估计最高k分数的权重，并将权重应用于损失函数。我们从美国健康调查系统的公共数据集中提取了253,680个调查回复，以定义糖尿病预测任务。本研究采用嵌套交叉验证以及应用于独立测试集的聚合模型来系统地评估所提出的方法。与传统的二元交叉熵损失和Focal损失相比，最高k损失使最高1%评分的准确率（阳性预测值）提高了0.05 (95% CI: 0.041 ~ 0.055)，最高5%评分的准确率提高了0.03 (95% CI: 0.024 ~ 0.032)，最高10%评分的准确率提高了0.02 （95% CI: 0.016 ~ 0.021）。引入的最高k损失函数解决了流行风险预测模型的问题，并提供了一个实用的解决方案，重点关注具有k最高预测分数的患者，他们可以实际接受干预，而不是整个患者群体。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Practical Approach to Disease Risk Prediction: Focus on High-Risk Patients via Highest-k Loss.

Disease risk prediction models play an important role in preventing disease developments in modern healthcare. However, the lack of focus on high-risk patients has hindered the large-scale practical application of these models, especially considering the limitation of medical resources available for following up on patients who are deemed high-risk. In this study, we propose a novel and practical approach that focuses on minimizing the number of false positive observations among high-risk patients by introducing the Highest-k Loss. The solution is to estimate the weights of the highest $k$ scores with a differentiable estimation of the sorting operation and apply the weights to the loss function. We extracted 253,680 survey responses from a public dataset of the U.S. health survey system to define a diabetes prediction task. This study employs nested cross-validation as well as an aggregated model applied to an independent test set to systematically evaluate the proposed method. Compared with traditional binary cross entropy loss and Focal loss, the Highest- $k$ loss improved the precision (positive predictive value) for the highest 1% scores by 0.05 (95% CI: 0.041-0.055), the highest 5% scores by 0.03 (95% CI: 0.024-0.032), and the highest 10% scores by 0.02 (95% CI: 0.016-0.021). The introduced Highest- $k$ loss function addresses the problem of prevailing risk prediction models and offers a practical solution that focuses on patients with the $k$ highest predictive scores who can realistically receive an intervention as opposed to the entire patient population.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine

自引率

0.00%

发文量