基于机器学习的急性胰腺炎预后模型系统综述：改进方法，提高报告质量。

IF 9.9 1区医学 Q1 Medicine

PLoS Medicine Pub Date : 2025-02-24 eCollection Date: 2025-02-01 DOI:10.1371/journal.pmed.1004432

Brian Critelli, Amier Hassan, Ila Lahooti, Lydia Noh, Jun Sung Park, Kathleen Tong, Ali Lahooti, Nathan Matzko, Jan Niklas Adams, Lukas Liss, Justin Quion, David Restrepo, Melica Nikahd, Stacey Culp, Adam Lacy-Hulbert, Cate Speake, James Buxbaum, Jason Bischof, Cemal Yazici, Anna Evans-Phillips, Sophie Terp, Alexandra Weissman, Darwin Conwell, Philip Hart, Mitchell Ramsey, Somashekar Krishna, Samuel Han, Erica Park, Raj Shah, Venkata Akshintala, John A Windsor, Nikhil K Mull, Georgios Papachristou, Leo Anthony Celi, Peter Lee

{"title":"基于机器学习的急性胰腺炎预后模型系统综述：改进方法，提高报告质量。","authors":"Brian Critelli, Amier Hassan, Ila Lahooti, Lydia Noh, Jun Sung Park, Kathleen Tong, Ali Lahooti, Nathan Matzko, Jan Niklas Adams, Lukas Liss, Justin Quion, David Restrepo, Melica Nikahd, Stacey Culp, Adam Lacy-Hulbert, Cate Speake, James Buxbaum, Jason Bischof, Cemal Yazici, Anna Evans-Phillips, Sophie Terp, Alexandra Weissman, Darwin Conwell, Philip Hart, Mitchell Ramsey, Somashekar Krishna, Samuel Han, Erica Park, Raj Shah, Venkata Akshintala, John A Windsor, Nikhil K Mull, Georgios Papachristou, Leo Anthony Celi, Peter Lee","doi":"10.1371/journal.pmed.1004432","DOIUrl":null,"url":null,"abstract":"Background: An accurate prognostic tool is essential to aid clinical decision-making (e.g., patient triage) and to advance personalized medicine. However, such a prognostic tool is lacking for acute pancreatitis (AP). Increasingly machine learning (ML) techniques are being used to develop high-performing prognostic models in AP. However, methodologic and reporting quality has received little attention. High-quality reporting and study methodology are critical for model validity, reproducibility, and clinical implementation. In collaboration with content experts in ML methodology, we performed a systematic review critically appraising the quality of methodology and reporting of recently published ML AP prognostic models.Methods/findings: Using a validated search strategy, we identified ML AP studies from the databases MEDLINE and EMBASE published between January 2021 and December 2023. We also searched pre-print servers medRxiv, bioRxiv, and arXiv for pre-prints registered between January 2021 and December 2023. Eligibility criteria included all retrospective or prospective studies that developed or validated new or existing ML models in patients with AP that predicted an outcome following an episode of AP. Meta-analysis was considered if there was homogeneity in the study design and in the type of outcome predicted. For risk of bias (ROB) assessment, we used the Prediction Model Risk of Bias Assessment Tool. Quality of reporting was assessed using the Transparent Reporting of a Multivariable Prediction Model of Individual Prognosis or Diagnosis-Artificial Intelligence (TRIPOD+AI) statement that defines standards for 27 items that should be reported in publications using ML prognostic models. The search strategy identified 6,480 publications of which 30 met the eligibility criteria. Studies originated from China (22), the United States (4), and other (4). All 30 studies developed a new ML model and none sought to validate an existing ML model, producing a total of 39 new ML models. AP severity (23/39) or mortality (6/39) were the most common outcomes predicted. The mean area under the curve for all models and endpoints was 0.91 (SD 0.08). The ROB was high for at least one domain in all 39 models, particularly for the analysis domain (37/39 models). Steps were not taken to minimize over-optimistic model performance in 27/39 models. Due to heterogeneity in the study design and in how the outcomes were defined and determined, meta-analysis was not performed. Studies reported on only 15/27 items from TRIPOD+AI standards, with only 7/30 justifying sample size and 13/30 assessing data quality. Other reporting deficiencies included omissions regarding human-AI interaction (28/30), handling low-quality or incomplete data in practice (27/30), sharing analytical codes (25/30), study protocols (25/30), and reporting source data (19/30).Conclusions: There are significant deficiencies in the methodology and reporting of recently published ML based prognostic models in AP patients. These undermine the validity, reproducibility, and implementation of these prognostic models despite their promise of superior predictive accuracy.Registration: Research Registry (reviewregistry1727).","PeriodicalId":49008,"journal":{"name":"PLoS Medicine","volume":"22 2","pages":"e1004432"},"PeriodicalIF":9.9000,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11870378/pdf/","citationCount":"0","resultStr":"{\"title\":\"A systematic review of machine learning-based prognostic models for acute pancreatitis: Towards improving methods and reporting quality.\",\"authors\":\"Brian Critelli, Amier Hassan, Ila Lahooti, Lydia Noh, Jun Sung Park, Kathleen Tong, Ali Lahooti, Nathan Matzko, Jan Niklas Adams, Lukas Liss, Justin Quion, David Restrepo, Melica Nikahd, Stacey Culp, Adam Lacy-Hulbert, Cate Speake, James Buxbaum, Jason Bischof, Cemal Yazici, Anna Evans-Phillips, Sophie Terp, Alexandra Weissman, Darwin Conwell, Philip Hart, Mitchell Ramsey, Somashekar Krishna, Samuel Han, Erica Park, Raj Shah, Venkata Akshintala, John A Windsor, Nikhil K Mull, Georgios Papachristou, Leo Anthony Celi, Peter Lee\",\"doi\":\"10.1371/journal.pmed.1004432\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: An accurate prognostic tool is essential to aid clinical decision-making (e.g., patient triage) and to advance personalized medicine. However, such a prognostic tool is lacking for acute pancreatitis (AP). Increasingly machine learning (ML) techniques are being used to develop high-performing prognostic models in AP. However, methodologic and reporting quality has received little attention. High-quality reporting and study methodology are critical for model validity, reproducibility, and clinical implementation. In collaboration with content experts in ML methodology, we performed a systematic review critically appraising the quality of methodology and reporting of recently published ML AP prognostic models.Methods/findings: Using a validated search strategy, we identified ML AP studies from the databases MEDLINE and EMBASE published between January 2021 and December 2023. We also searched pre-print servers medRxiv, bioRxiv, and arXiv for pre-prints registered between January 2021 and December 2023. Eligibility criteria included all retrospective or prospective studies that developed or validated new or existing ML models in patients with AP that predicted an outcome following an episode of AP. Meta-analysis was considered if there was homogeneity in the study design and in the type of outcome predicted. For risk of bias (ROB) assessment, we used the Prediction Model Risk of Bias Assessment Tool. Quality of reporting was assessed using the Transparent Reporting of a Multivariable Prediction Model of Individual Prognosis or Diagnosis-Artificial Intelligence (TRIPOD+AI) statement that defines standards for 27 items that should be reported in publications using ML prognostic models. The search strategy identified 6,480 publications of which 30 met the eligibility criteria. Studies originated from China (22), the United States (4), and other (4). All 30 studies developed a new ML model and none sought to validate an existing ML model, producing a total of 39 new ML models. AP severity (23/39) or mortality (6/39) were the most common outcomes predicted. The mean area under the curve for all models and endpoints was 0.91 (SD 0.08). The ROB was high for at least one domain in all 39 models, particularly for the analysis domain (37/39 models). Steps were not taken to minimize over-optimistic model performance in 27/39 models. Due to heterogeneity in the study design and in how the outcomes were defined and determined, meta-analysis was not performed. Studies reported on only 15/27 items from TRIPOD+AI standards, with only 7/30 justifying sample size and 13/30 assessing data quality. Other reporting deficiencies included omissions regarding human-AI interaction (28/30), handling low-quality or incomplete data in practice (27/30), sharing analytical codes (25/30), study protocols (25/30), and reporting source data (19/30).Conclusions: There are significant deficiencies in the methodology and reporting of recently published ML based prognostic models in AP patients. These undermine the validity, reproducibility, and implementation of these prognostic models despite their promise of superior predictive accuracy.Registration: Research Registry (reviewregistry1727).\",\"PeriodicalId\":49008,\"journal\":{\"name\":\"PLoS Medicine\",\"volume\":\"22 2\",\"pages\":\"e1004432\"},\"PeriodicalIF\":9.9000,\"publicationDate\":\"2025-02-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11870378/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pmed.1004432\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/2/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1371/journal.pmed.1004432","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

摘要

背景：准确的预后工具对于帮助临床决策（如患者分诊）和推进个性化医疗至关重要。然而，急性胰腺炎（AP）缺乏这样的预后工具。越来越多的机器学习（ML）技术被用于开发AP的高性能预后模型。然而，方法和报告质量很少受到关注。高质量的报告和研究方法对模型的有效性、可重复性和临床实施至关重要。与ML方法学的内容专家合作，我们进行了系统的回顾，批判性地评估方法学的质量和最近发表的ML AP预后模型的报告。方法/研究结果：使用经过验证的搜索策略，我们从MEDLINE和EMBASE数据库中确定了2021年1月至2023年12月期间发表的ML AP研究。我们还检索了预印本服务器medRxiv、bioRxiv和arXiv，检索了2021年1月至2023年12月注册的预印本。入选标准包括所有在AP患者中开发或验证新的或现有的ML模型的回顾性或前瞻性研究，这些模型预测了AP发作后的结果。如果研究设计和预测的结果类型存在同质性，则考虑荟萃分析。对于偏倚风险（ROB）评估，我们使用了预测模型偏倚风险评估工具。报告质量使用透明报告个体预后或诊断-人工智能多变量预测模型（TRIPOD+AI）声明进行评估，该声明定义了应在使用ML预后模型的出版物中报告的27项标准。检索策略确定了6 480份出版物，其中30份符合资格标准。研究来自中国（22）、美国(4)和其他国家(4)。所有30项研究都开发了新的机器学习模型，没有一项研究试图验证现有的机器学习模型，总共产生了39个新的机器学习模型。AP严重程度（23/39）或死亡率（6/39）是最常见的预测结果。所有模型和终点的曲线下平均面积为0.91 （SD 0.08）。在所有39个模型中，至少有一个领域的ROB很高，特别是在分析领域（37/39模型）。在27/39模型中，没有采取措施最小化过度乐观的模型性能。由于研究设计的异质性以及如何定义和确定结果，未进行荟萃分析。根据TRIPOD+AI标准，只有15/27的研究报告，只有7/30的研究证明了样本大小，13/30的研究评估了数据质量。其他报告缺陷包括人类与人工智能交互的遗漏（28/30），在实践中处理低质量或不完整的数据（27/30），共享分析代码（25/30），研究方案（25/30）和报告源数据（19/30）。结论：最近发表的基于ML的AP患者预后模型的方法学和报告存在明显缺陷。这些影响了这些预测模型的有效性、可重复性和实施，尽管它们承诺具有卓越的预测准确性。注册：研究注册中心（reviewregistry1727）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A systematic review of machine learning-based prognostic models for acute pancreatitis: Towards improving methods and reporting quality.

Background: An accurate prognostic tool is essential to aid clinical decision-making (e.g., patient triage) and to advance personalized medicine. However, such a prognostic tool is lacking for acute pancreatitis (AP). Increasingly machine learning (ML) techniques are being used to develop high-performing prognostic models in AP. However, methodologic and reporting quality has received little attention. High-quality reporting and study methodology are critical for model validity, reproducibility, and clinical implementation. In collaboration with content experts in ML methodology, we performed a systematic review critically appraising the quality of methodology and reporting of recently published ML AP prognostic models.

Methods/findings: Using a validated search strategy, we identified ML AP studies from the databases MEDLINE and EMBASE published between January 2021 and December 2023. We also searched pre-print servers medRxiv, bioRxiv, and arXiv for pre-prints registered between January 2021 and December 2023. Eligibility criteria included all retrospective or prospective studies that developed or validated new or existing ML models in patients with AP that predicted an outcome following an episode of AP. Meta-analysis was considered if there was homogeneity in the study design and in the type of outcome predicted. For risk of bias (ROB) assessment, we used the Prediction Model Risk of Bias Assessment Tool. Quality of reporting was assessed using the Transparent Reporting of a Multivariable Prediction Model of Individual Prognosis or Diagnosis-Artificial Intelligence (TRIPOD+AI) statement that defines standards for 27 items that should be reported in publications using ML prognostic models. The search strategy identified 6,480 publications of which 30 met the eligibility criteria. Studies originated from China (22), the United States (4), and other (4). All 30 studies developed a new ML model and none sought to validate an existing ML model, producing a total of 39 new ML models. AP severity (23/39) or mortality (6/39) were the most common outcomes predicted. The mean area under the curve for all models and endpoints was 0.91 (SD 0.08). The ROB was high for at least one domain in all 39 models, particularly for the analysis domain (37/39 models). Steps were not taken to minimize over-optimistic model performance in 27/39 models. Due to heterogeneity in the study design and in how the outcomes were defined and determined, meta-analysis was not performed. Studies reported on only 15/27 items from TRIPOD+AI standards, with only 7/30 justifying sample size and 13/30 assessing data quality. Other reporting deficiencies included omissions regarding human-AI interaction (28/30), handling low-quality or incomplete data in practice (27/30), sharing analytical codes (25/30), study protocols (25/30), and reporting source data (19/30).

Conclusions: There are significant deficiencies in the methodology and reporting of recently published ML based prognostic models in AP patients. These undermine the validity, reproducibility, and implementation of these prognostic models despite their promise of superior predictive accuracy.

Registration: Research Registry (reviewregistry1727).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

PLoS Medicine MEDICINE, GENERAL & INTERNAL-

CiteScore

17.60

自引率

0.60%

发文量

227

审稿时长

4-8 weeks

期刊介绍： PLOS Medicine is a prominent platform for discussing and researching global health challenges. The journal covers a wide range of topics, including biomedical, environmental, social, and political factors affecting health. It prioritizes articles that contribute to clinical practice, health policy, or a better understanding of pathophysiology, ultimately aiming to improve health outcomes across different settings. The journal is unwavering in its commitment to uphold the highest ethical standards in medical publishing. This includes actively managing and disclosing any conflicts of interest related to reporting, reviewing, and publishing. PLOS Medicine promotes transparency in the entire review and publication process. The journal also encourages data sharing and encourages the reuse of published work. Additionally, authors retain copyright for their work, and the publication is made accessible through Open Access with no restrictions on availability and dissemination. PLOS Medicine takes measures to avoid conflicts of interest associated with advertising drugs and medical devices or engaging in the exclusive sale of reprints.