{"title":"The red blood cell distribution width to albumin ratio was a potential prognostic biomarker for acute respiratory failure: a retrospective study","authors":"Qian He, Song Hu, Jun xie, Hui Liu, Chong Li","doi":"10.1186/s12911-024-02639-4","DOIUrl":"https://doi.org/10.1186/s12911-024-02639-4","url":null,"abstract":"The association between red blood cell distribution width (RDW) to albumin ratio (RAR) and prognosis in patients with acute respiratory failure (ARF) admitted to the Intensive Care Unit (ICU) remains unclear. This retrospective cohort study aims to investigate this association. Clinical information of ARF patients was collected from the Medical Information Mart for Intensive Care IV (MIMIC-IV) version 2.0 database. The primary outcome was, in-hospital mortality and secondary outcomes included 28-day mortality, 60-day mortality, length of hospital stay, and length of ICU stay. Cox regression models and subgroup analyses were conducted to explore the relationship between RAR and mortality. A total of 4547 patients with acute respiratory failure were enrolled, with 2277 in the low ratio group (RAR < 4.83) and 2270 in the high ratio group (RAR > = 4.83). Kaplan-Meier survival analysis demonstrated a significant difference in survival probability between the two groups. After adjusting for confounding factors, the Cox regression analysis showed that the high RAR ratio had a higher hazard ratio (HR) for in-hospital mortality (HR 1.22, 95% CI 1.07–1.40; P = 0.003), as well as for 28-day mortality and 60-day mortality. Propensity score-matched (PSM) analysis further supported the finding that high RAR was an independent risk factor for ARF. This study reveals that RAR is an independent risk factor for poor clinical prognosis in patients with ARF admitted to the ICU. Higher RAR levels were associated with increased in-hospital, 28-day and 60-day mortality rates.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.5,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142197952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Harnessing computational tools of the digital era for enhanced infection control","authors":"Francesco Branda","doi":"10.1186/s12911-024-02650-9","DOIUrl":"https://doi.org/10.1186/s12911-024-02650-9","url":null,"abstract":"This paper explores the potential of artificial intelligence, machine learning, and big data analytics in revolutionizing infection control. It addresses the challenges and innovative approaches in combating infectious diseases and antimicrobial resistance, emphasizing the critical role of interdisciplinary collaboration, ethical data practices, and integration of advanced computational tools in modern healthcare.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.5,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142197948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md. Sohanur Rahman, Khandaker Reajul Islam, Johayra Prithula, Jaya Kumar, Mufti Mahmud, Mohammed Fasihul Alam, Mamun Bin Ibne Reaz, Abdulrahman Alqahtani, Muhammad E. H. Chowdhury
{"title":"Machine learning-based prognostic model for 30-day mortality prediction in Sepsis-3","authors":"Md. Sohanur Rahman, Khandaker Reajul Islam, Johayra Prithula, Jaya Kumar, Mufti Mahmud, Mohammed Fasihul Alam, Mamun Bin Ibne Reaz, Abdulrahman Alqahtani, Muhammad E. H. Chowdhury","doi":"10.1186/s12911-024-02655-4","DOIUrl":"https://doi.org/10.1186/s12911-024-02655-4","url":null,"abstract":"Sepsis poses a critical threat to hospitalized patients, particularly those in the Intensive Care Unit (ICU). Rapid identification of Sepsis is crucial for improving survival rates. Machine learning techniques offer advantages over traditional methods for predicting outcomes. This study aimed to develop a prognostic model using a Stacking-based Meta-Classifier to predict 30-day mortality risks in Sepsis-3 patients from the MIMIC-III database. A cohort of 4,240 Sepsis-3 patients was analyzed, with 783 experiencing 30-day mortality and 3,457 surviving. Fifteen biomarkers were selected using feature ranking methods, including Extreme Gradient Boosting (XGBoost), Random Forest, and Extra Tree, and the Logistic Regression (LR) model was used to assess their individual predictability with a fivefold cross-validation approach for the validation of the prediction. The dataset was balanced using the SMOTE-TOMEK LINK technique, and a stacking-based meta-classifier was used for 30-day mortality prediction. The SHapley Additive explanations analysis was performed to explain the model’s prediction. Using the LR classifier, the model achieved an area under the curve or AUC score of 0.99. A nomogram provided clinical insights into the biomarkers' significance. The stacked meta-learner, LR classifier exhibited the best performance with 95.52% accuracy, 95.79% precision, 95.52% recall, 93.65% specificity, and a 95.60% F1-score. In conjunction with the nomogram, the proposed stacking classifier model effectively predicted 30-day mortality in Sepsis patients. This approach holds promise for early intervention and improved outcomes in treating Sepsis cases.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.5,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142197950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ziwei Fu, Jinwei Xi, Zhi Ji, Ruxue Zhang, Jianping Wang, Rui Shi, Xiaoli Pu, Jingni Yu, Fang Xue, Jianrong Liu, Yanrong Wang, Hua Zhong, Jun Feng, Min Zhang, Yuan He
{"title":"Analysis of anterior segment in primary angle closure suspect with deep learning models","authors":"Ziwei Fu, Jinwei Xi, Zhi Ji, Ruxue Zhang, Jianping Wang, Rui Shi, Xiaoli Pu, Jingni Yu, Fang Xue, Jianrong Liu, Yanrong Wang, Hua Zhong, Jun Feng, Min Zhang, Yuan He","doi":"10.1186/s12911-024-02658-1","DOIUrl":"https://doi.org/10.1186/s12911-024-02658-1","url":null,"abstract":"To analyze primary angle closure suspect (PACS) patients’ anatomical characteristics of anterior chamber configuration, and to establish artificial intelligence (AI)-aided diagnostic system for PACS screening. A total of 1668 scans of 839 patients were included in this cross-sectional study. The subjects were divided into two groups: PACS group and normal group. With anterior segment optical coherence tomography scans, the anatomical diversity between two groups was compared, and anterior segment structure features of PACS were extracted. Then, AI-aided diagnostic system was constructed, which based different algorithms such as classification and regression tree (CART), random forest (RF), logistic regression (LR), VGG-16 and Alexnet. Then the diagnostic efficiencies of different algorithms were evaluated, and compared with junior physicians and experienced ophthalmologists. RF [sensitivity (Se) = 0.84; specificity (Sp) = 0.92; positive predict value (PPV) = 0.82; negative predict value (NPV) = 0.95; area under the curve (AUC) = 0.90] and CART (Se = 0.76, Sp = 0.93, PPV = 0.85, NPV = 0.92, AUC = 0.90) showed better performance than LR (Se = 0.68, Sp = 0.91, PPV = 0.79, NPV = 0.90, AUC = 0.86). In convolutional neural networks (CNN), Alexnet (Se = 0.83, Sp = 0.95, PPV = 0.92, NPV = 0.87, AUC = 0.85) was better than VGG-16 (Se = 0.84, Sp = 0.90, PPV = 0.85, NPV = 0.90, AUC = 0.79). The performance of 2 CNN algorithms was better than 5 junior physicians, and the mean value of diagnostic indicators of 2 CNN algorithm was similar to experienced ophthalmologists. PACS patients have distinct anatomical characteristics compared with health controls. AI models for PACS screening are reliable and powerful, equivalent to experienced ophthalmologists.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.5,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142197951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clinician voices on ethics of LLM integration in healthcare: a thematic analysis of ethical concerns and implications","authors":"Tala Mirzaei, Leila Amini, Pouyan Esmaeilzadeh","doi":"10.1186/s12911-024-02656-3","DOIUrl":"https://doi.org/10.1186/s12911-024-02656-3","url":null,"abstract":"This study aimed to explain and categorize key ethical concerns about integrating large language models (LLMs) in healthcare, drawing particularly from the perspectives of clinicians in online discussions. We analyzed 3049 posts and comments extracted from a self-identified clinician subreddit using unsupervised machine learning via Latent Dirichlet Allocation and a structured qualitative analysis methodology. Analysis uncovered 14 salient themes of ethical implications, which we further consolidated into 4 overarching domains reflecting ethical issues around various clinical applications of LLM in healthcare, LLM coding, algorithm, and data governance, LLM’s role in health equity and the distribution of public health services, and the relationship between users (human) and LLM systems (machine). Mapping themes to ethical frameworks in literature illustrated multifaceted issues covering transparent LLM decisions, fairness, privacy, access disparities, user experiences, and reliability. This study emphasizes the need for ongoing ethical review from stakeholders to ensure responsible innovation and advocates for tailored governance to enhance LLM use in healthcare, aiming to improve clinical outcomes ethically and effectively.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.5,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142197949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data privacy-aware machine learning approach in pancreatic cancer diagnosis.","authors":"Ömer Faruk Akmeşe","doi":"10.1186/s12911-024-02657-2","DOIUrl":"10.1186/s12911-024-02657-2","url":null,"abstract":"<p><strong>Problem: </strong>Pancreatic ductal adenocarcinoma (PDAC) is considered a highly lethal cancer due to its advanced stage diagnosis. The five-year survival rate after diagnosis is less than 10%. However, if diagnosed early, the five-year survival rate can reach up to 70%. Early diagnosis of PDAC can aid treatment and improve survival rates by taking necessary precautions. The challenge is to develop a reliable, data privacy-aware machine learning approach that can accurately diagnose pancreatic cancer with biomarkers.</p><p><strong>Aim: </strong>The study aims to diagnose a patient's pancreatic cancer while ensuring the confidentiality of patient records. In addition, the study aims to guide researchers and clinicians in developing innovative methods for diagnosing pancreatic cancer.</p><p><strong>Methods: </strong>Machine learning, a branch of artificial intelligence, can identify patterns by analyzing large datasets. The study pre-processed a dataset containing urine biomarkers with operations such as filling in missing values, cleaning outliers, and feature selection. The data was encrypted using the Fernet encryption algorithm to ensure confidentiality. Ten separate machine learning models were applied to predict individuals with PDAC. Performance metrics such as F1 score, recall, precision, and accuracy were used in the modeling process.</p><p><strong>Results: </strong>Among the 590 clinical records analyzed, 199 (33.7%) belonged to patients with pancreatic cancer, 208 (35.3%) to patients with non-cancerous pancreatic disorders (such as benign hepatobiliary disease), and 183 (31%) to healthy individuals. The LGBM algorithm showed the highest efficiency by achieving an accuracy of 98.8%. The accuracy of the other algorithms ranged from 98 to 86%. In order to understand which features are more critical and which data the model is based on, the analysis found that the features \"plasma_CA19_9\", REG1A, TFF1, and LYVE1 have high importance levels. The LIME analysis also analyzed which features of the model are important in the decision-making process.</p><p><strong>Conclusions: </strong>This research outlines a data privacy-aware machine learning tool for predicting PDAC. The results show that a promising approach can be presented for clinical application. Future research should expand the dataset and focus on validation by applying it to various populations.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11375871/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142139379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maryam Mooghali, Austin M Stroud, Dong Whi Yoo, Barbara A Barry, Alyssa A Grimshaw, Joseph S Ross, Xuan Zhu, Jennifer E Miller
{"title":"Trustworthy and ethical AI-enabled cardiovascular care: a rapid review.","authors":"Maryam Mooghali, Austin M Stroud, Dong Whi Yoo, Barbara A Barry, Alyssa A Grimshaw, Joseph S Ross, Xuan Zhu, Jennifer E Miller","doi":"10.1186/s12911-024-02653-6","DOIUrl":"10.1186/s12911-024-02653-6","url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI) is increasingly used for prevention, diagnosis, monitoring, and treatment of cardiovascular diseases. Despite the potential for AI to improve care, ethical concerns and mistrust in AI-enabled healthcare exist among the public and medical community. Given the rapid and transformative recent growth of AI in cardiovascular care, to inform practice guidelines and regulatory policies that facilitate ethical and trustworthy use of AI in medicine, we conducted a literature review to identify key ethical and trust barriers and facilitators from patients' and healthcare providers' perspectives when using AI in cardiovascular care.</p><p><strong>Methods: </strong>In this rapid literature review, we searched six bibliographic databases to identify publications discussing transparency, trust, or ethical concerns (outcomes of interest) associated with AI-based medical devices (interventions of interest) in the context of cardiovascular care from patients', caregivers', or healthcare providers' perspectives. The search was completed on May 24, 2022 and was not limited by date or study design.</p><p><strong>Results: </strong>After reviewing 7,925 papers from six databases and 3,603 papers identified through citation chasing, 145 articles were included. Key ethical concerns included privacy, security, or confidentiality issues (n = 59, 40.7%); risk of healthcare inequity or disparity (n = 36, 24.8%); risk of patient harm (n = 24, 16.6%); accountability and responsibility concerns (n = 19, 13.1%); problematic informed consent and potential loss of patient autonomy (n = 17, 11.7%); and issues related to data ownership (n = 11, 7.6%). Major trust barriers included data privacy and security concerns, potential risk of patient harm, perceived lack of transparency about AI-enabled medical devices, concerns about AI replacing human aspects of care, concerns about prioritizing profits over patients' interests, and lack of robust evidence related to the accuracy and limitations of AI-based medical devices. Ethical and trust facilitators included ensuring data privacy and data validation, conducting clinical trials in diverse cohorts, providing appropriate training and resources to patients and healthcare providers and improving their engagement in different phases of AI implementation, and establishing further regulatory oversights.</p><p><strong>Conclusion: </strong>This review revealed key ethical concerns and barriers and facilitators of trust in AI-enabled medical devices from patients' and healthcare providers' perspectives. Successful integration of AI into cardiovascular care necessitates implementation of mitigation strategies. These strategies should focus on enhanced regulatory oversight on the use of patient data and promoting transparency around the use of AI in patient care.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11373417/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142131912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ahmed Medhat Zayed, Arne Janssens, Pavlos Mamouris, Nicolas Delvaux
{"title":"lab2clean: a novel algorithm for automated cleaning of retrospective clinical laboratory results data for secondary uses.","authors":"Ahmed Medhat Zayed, Arne Janssens, Pavlos Mamouris, Nicolas Delvaux","doi":"10.1186/s12911-024-02652-7","DOIUrl":"10.1186/s12911-024-02652-7","url":null,"abstract":"<p><strong>Background: </strong>The integrity of clinical research and machine learning models in healthcare heavily relies on the quality of underlying clinical laboratory data. However, the preprocessing of this data to ensure its reliability and accuracy remains a significant challenge due to variations in data recording and reporting standards.</p><p><strong>Methods: </strong>We developed lab2clean, a novel algorithm aimed at automating and standardizing the cleaning of retrospective clinical laboratory results data. lab2clean was implemented as two R functions specifically designed to enhance data conformance and plausibility by standardizing result formats and validating result values. The functionality and performance of the algorithm were evaluated using two extensive electronic medical record (EMR) databases, encompassing various clinical settings.</p><p><strong>Results: </strong>lab2clean effectively reduced the variability of laboratory results and identified potentially erroneous records. Upon deployment, it demonstrated effective and fast standardization and validation of substantial laboratory data records. The evaluation highlighted significant improvements in the conformance and plausibility of lab results, confirming the algorithm's efficacy in handling large-scale data sets.</p><p><strong>Conclusions: </strong>lab2clean addresses the challenge of preprocessing and cleaning clinical laboratory data, a critical step in ensuring high-quality data for research outcomes. It offers a straightforward, efficient tool for researchers, improving the quality of clinical laboratory data, a major portion of healthcare data. Thereby, enhancing the reliability and reproducibility of clinical research outcomes and clinical machine learning models. Future developments aim to broaden its functionality and accessibility, solidifying its vital role in healthcare data management.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11370074/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142124888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mehdi Sharafi, Mohammad Ali Mohsenpour, Sima Afrashteh, Mohammad Hassan Eftekhari, Azizallah Dehghan, Akram Farhadi, Aboubakr Jafarnezhad, Abdoljabbar Zakeri, Mehdi Azizmohammad Looha
{"title":"Factors affecting the survival of prediabetic patients: comparison of Cox proportional hazards model and random survival forest method.","authors":"Mehdi Sharafi, Mohammad Ali Mohsenpour, Sima Afrashteh, Mohammad Hassan Eftekhari, Azizallah Dehghan, Akram Farhadi, Aboubakr Jafarnezhad, Abdoljabbar Zakeri, Mehdi Azizmohammad Looha","doi":"10.1186/s12911-024-02648-3","DOIUrl":"10.1186/s12911-024-02648-3","url":null,"abstract":"<p><strong>Background: </strong>The worldwide prevalence of type 2 diabetes mellitus in adults is experiencing a rapid increase. This study aimed to identify the factors affecting the survival of prediabetic patients using a comparison of the Cox proportional hazards model (CPH) and the Random survival forest (RSF).</p><p><strong>Method: </strong>This prospective cohort study was performed on 746 prediabetics in southwest Iran. The demographic, lifestyle, and clinical data of the participants were recorded. The CPH and RSF models were used to determine the patients' survival. Furthermore, the concordance index (C-index) and time-dependent receiver operating characteristic (ROC) curve were employed to compare the performance of the Cox proportional hazards (CPH) model and the random survival forest (RSF) model.</p><p><strong>Results: </strong>The 5-year cumulative T2DM incidence was 12.73%. Based on the results of the CPH model, NAFLD (HR = 1.74, 95% CI: 1.06, 2.85), FBS (HR = 1.008, 95% CI: 1.005, 1.012) and increased abdominal fat (HR = 1.02, 95% CI: 1.01, 1.04) were directly associated with diabetes occurrence in prediabetic patients. The RSF model suggests that factors including FBS, waist circumference, depression, NAFLD, afternoon sleep, and female gender are the most important variables that predict diabetes. The C-index indicated that the RSF model has a higher percentage of agreement than the CPH model, and in the weighted Brier Score index, the RSF model had less error than the Kaplan-Meier and CPH model.</p><p><strong>Conclusion: </strong>Our findings show that the incidence of diabetes was alarmingly high in Iran. The results suggested that several demographic and clinical factors are associated with diabetes occurrence in prediabetic patients. The high-risk population needs special measures for screening and care programs.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11373449/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142124887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yingxia Li, Tobias Herold, Ulrich Mansmann, Roman Hornung
{"title":"Does combining numerous data types in multi-omics data improve or hinder performance in survival prediction? Insights from a large-scale benchmark study.","authors":"Yingxia Li, Tobias Herold, Ulrich Mansmann, Roman Hornung","doi":"10.1186/s12911-024-02642-9","DOIUrl":"10.1186/s12911-024-02642-9","url":null,"abstract":"<p><strong>Background: </strong>Predictive modeling based on multi-omics data, which incorporates several types of omics data for the same patients, has shown potential to outperform single-omics predictive modeling. Most research in this domain focuses on incorporating numerous data types, despite the complexity and cost of acquiring them. The prevailing assumption is that increasing the number of data types necessarily improves predictive performance. However, the integration of less informative or redundant data types could potentially hinder this performance. Therefore, identifying the most effective combinations of omics data types that enhance predictive performance is critical for cost-effective and accurate predictions.</p><p><strong>Methods: </strong>In this study, we systematically evaluated the predictive performance of all 31 possible combinations including at least one of five genomic data types (mRNA, miRNA, methylation, DNAseq, and copy number variation) using 14 cancer datasets with right-censored survival outcomes, publicly available from the TCGA database. We employed various prediction methods and up-weighted clinical data in every model to leverage their predictive importance. Harrell's C-index and the integrated Brier Score were used as performance measures. To assess the robustness of our findings, we performed a bootstrap analysis at the level of the included datasets. Statistical testing was conducted for key results, limiting the number of tests to ensure a low risk of false positives.</p><p><strong>Results: </strong>Contrary to expectations, we found that using only mRNA data or a combination of mRNA and miRNA data was sufficient for most cancer types. For some cancer types, the additional inclusion of methylation data led to improved prediction results. Far from enhancing performance, the introduction of more data types most often resulted in a decline in performance, which varied between the two performance measures.</p><p><strong>Conclusions: </strong>Our findings challenge the prevailing notion that combining multiple omics data types in multi-omics survival prediction improves predictive performance. Thus, the widespread approach in multi-omics prediction of incorporating as many data types as possible should be reconsidered to avoid suboptimal prediction results and unnecessary expenditure.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11370316/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142119055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}